Skip to contents

Import a dataset in the application, with the OMOP Common Data Model

Usage

import_dataset(
  output,
  ns = character(),
  i18n = character(),
  r = shiny::reactiveValues(),
  d = shiny::reactiveValues(),
  dataset_id = integer(),
  data = tibble::tibble(),
  omop_table = "",
  omop_version = "6.0",
  read_with = "none",
  save_as = "none",
  rewrite = FALSE,
  allow_numeric_instead_integer = FALSE,
  allow_dttm_instead_date = FALSE
)

Arguments

output

Shiny output variable

ns

Shiny namespace

i18n

shiny.i18n object for translations

r

A shiny::reactiveValues object, used to communicate between modules

d

A shiny::reactiveValues object, used to communicate between modules

dataset_id

ID of the dataset, used to create a specific dataset folder in the application folders (integer)

data

Data variable (data.frame or tibble)

omop_table

Name of the OMOP table to import (character)

omop_version

OMOP version of the imported data, accepts "5.3", "5.4" and "6.0" (character)

read_with

The library used to read the data. Accepted values: "none", "vroom", "duckdb", "spark", "arrow" (character)

save_as

Save the data locally. Accepted values: "none", "csv", "parquet" (character)

rewrite

If save_as is different from 'none', rewrite or not existing data file (logical)

allow_numeric_instead_integer

Allow columns that should be of type integer to be of type numeric (logical)

allow_dttm_instead_date

Allow columns that should be of type datetime to be of type date (logical)

Details

This function is used within a dataset code and is invoked each time a user selects a dataset.

For each OMOP table you wish to import, you must create a function that, when called, loads the data from the specified table.

Then, utilize the import_dataset function to load data into the application.

Data can be loaded from several sources, including:

  • CSV files

  • Excel files

  • Parquet files

  • Local database connections

  • Remote database connections

Select the R library for reading the file using the read_with argument (options include vroom, duckdb, spark, or arrow). If read_with is set to "none", the data is loaded as is.

Choose the format for saving the data after import using the save_as argument (options are csv or parquet).

When loading data from a database, it's common to not save the data locally, in order to enhance application performance through partial data loading (lazy data reading).

If you wish to modify your data after loading, saving it locally may be beneficial to preserve your changes. In such cases, we recommend using the parquet storage format and loading the data with duckdb for efficient lazy reading.

The data you import must adhere to the OMOP common data model format. For more information, refer to the help pages in the app.

Examples

if (FALSE) {
person <- function() tibble::tibble(
    person_id = 1:100,
    gender_concept_id = sample(c(8507L, 8532L), 100, replace = TRUE),
    year_of_birth = sample(1920:2010, 100, replace = TRUE),
    month_of_birth = sample(1:12, 100, replace = TRUE),
    day_of_birth = sample(1:28, 100, replace = TRUE),
    race_concept_id = NA_integer_,
    ethnicity_concept_id = NA_integer_,
    location_id = sample(1:10, 100, replace = TRUE),
    provider_id = sample(1:10, 100, replace = TRUE),
    care_site_id = sample(1:10, 100, replace = TRUE),
    person_source_value = paste("Source", 1:100),
    gender_source_value = NA_character_,
    gender_source_concept_id = NA_integer_,
    race_source_value = NA_character_,
    race_source_concept_id = NA_integer_,
    ethnicity_source_value = NA_character_,
    ethnicity_source_concept_id = NA_integer_
    ) %>%
    dplyr::mutate(
        birth_datetime = lubridate::ymd_hms(paste0(paste(year_of_birth, month_of_birth, day_of_birth, sep = "-"), " 00:00:00")),
        death_datetime = dplyr::case_when(runif(100) < 2/3 ~ as.POSIXct(NA), TRUE ~ birth_datetime + lubridate::years(sample(30:80, 100, replace = TRUE))),
        .after = "day_of_birth"
   )
    
import_dataset(
    data = person(), omop_table = "person", omop_version = "6.0", read_with = "none", save_as = "none", rewrite = FALSE,
    output = output, ns = ns, i18n = i18n, r = r, d = d, dataset_id = 5, 
)

cat("\n")

d$person %>% nrow() # n = 100
}