Import a dataset — import

Import a dataset in the application, with the OMOP Common Data Model

Usage

import_dataset(
  output,
  ns = character(),
  i18n = character(),
  r = shiny::reactiveValues(),
  d = shiny::reactiveValues(),
  dataset_id = integer(),
  data = tibble::tibble(),
  omop_table = "",
  omop_version = "6.0",
  read_with = "none",
  save_as = "none",
  rewrite = FALSE,
  allow_numeric_instead_integer = FALSE,
  allow_dttm_instead_date = FALSE
)

Arguments

output: Shiny output variable
ns: Shiny namespace
i18n: shiny.i18n object for translations
r: A shiny::reactiveValues object, used to communicate between modules
d: A shiny::reactiveValues object, used to communicate between modules
dataset_id: ID of the dataset, used to create a specific dataset folder in the application folders (integer)
data: Data variable (data.frame or tibble)
omop_table: Name of the OMOP table to import (character)
omop_version: OMOP version of the imported data, accepts "5.3", "5.4" and "6.0" (character)
read_with: The library used to read the data. Accepted values: "none", "vroom", "duckdb", "spark", "arrow" (character)
save_as: Save the data locally. Accepted values: "none", "csv", "parquet" (character)
rewrite: If save_as is different from 'none', rewrite or not existing data file (logical)
allow_numeric_instead_integer: Allow columns that should be of type integer to be of type numeric (logical)
allow_dttm_instead_date: Allow columns that should be of type datetime to be of type date (logical)

Details

This function is used within a dataset code and is invoked each time a user selects a dataset.

For each OMOP table you wish to import, you must create a function that, when called, loads the data from the specified table.

Then, utilize the import_dataset function to load data into the application.

Data can be loaded from several sources, including:

CSV files
Excel files
Parquet files
Local database connections
Remote database connections

Select the R library for reading the file using the read_with argument (options include vroom, duckdb, spark, or arrow). If read_with is set to "none", the data is loaded as is.

Choose the format for saving the data after import using the save_as argument (options are csv or parquet).

When loading data from a database, it's common to not save the data locally, in order to enhance application performance through partial data loading (lazy data reading).

If you wish to modify your data after loading, saving it locally may be beneficial to preserve your changes. In such cases, we recommend using the parquet storage format and loading the data with duckdb for efficient lazy reading.

The data you import must adhere to the OMOP common data model format. For more information, refer to the help pages in the app.

Examples

if (FALSE) {
person <- function() tibble::tibble(
    person_id = 1:100,
    gender_concept_id = sample(c(8507L, 8532L), 100, replace = TRUE),
    year_of_birth = sample(1920:2010, 100, replace = TRUE),
    month_of_birth = sample(1:12, 100, replace = TRUE),
    day_of_birth = sample(1:28, 100, replace = TRUE),
    race_concept_id = NA_integer_,
    ethnicity_concept_id = NA_integer_,
    location_id = sample(1:10, 100, replace = TRUE),
    provider_id = sample(1:10, 100, replace = TRUE),
    care_site_id = sample(1:10, 100, replace = TRUE),
    person_source_value = paste("Source", 1:100),
    gender_source_value = NA_character_,
    gender_source_concept_id = NA_integer_,
    race_source_value = NA_character_,
    race_source_concept_id = NA_integer_,
    ethnicity_source_value = NA_character_,
    ethnicity_source_concept_id = NA_integer_
    ) %>%
    dplyr::mutate(
        birth_datetime = lubridate::ymd_hms(paste0(paste(year_of_birth, month_of_birth, day_of_birth, sep = "-"), " 00:00:00")),
        death_datetime = dplyr::case_when(runif(100) < 2/3 ~ as.POSIXct(NA), TRUE ~ birth_datetime + lubridate::years(sample(30:80, 100, replace = TRUE))),
        .after = "day_of_birth"
   )
    
import_dataset(
    data = person(), omop_table = "person", omop_version = "6.0", read_with = "none", save_as = "none", rewrite = FALSE,
    output = output, ns = ns, i18n = i18n, r = r, d = d, dataset_id = 5, 
)

cat("\n")

d$person %>% nrow() # n = 100
}