Import a dataset
import_dataset.Rd
Import a dataset in the application, with the OMOP Common Data Model
Usage
import_dataset(
output,
ns = character(),
i18n = character(),
r = shiny::reactiveValues(),
d = shiny::reactiveValues(),
dataset_id = integer(),
data = tibble::tibble(),
omop_table = "",
omop_version = "6.0",
read_with = "none",
save_as = "none",
rewrite = FALSE,
allow_numeric_instead_integer = FALSE,
allow_dttm_instead_date = FALSE
)
Arguments
- output
Shiny output variable
- ns
Shiny namespace
- i18n
shiny.i18n object for translations
- r
A shiny::reactiveValues object, used to communicate between modules
- d
A shiny::reactiveValues object, used to communicate between modules
- dataset_id
ID of the dataset, used to create a specific dataset folder in the application folders (integer)
- data
Data variable (data.frame or tibble)
- omop_table
Name of the OMOP table to import (character)
- omop_version
OMOP version of the imported data, accepts "5.3", "5.4" and "6.0" (character)
- read_with
The library used to read the data. Accepted values: "none", "vroom", "duckdb", "spark", "arrow" (character)
- save_as
Save the data locally. Accepted values: "none", "csv", "parquet" (character)
- rewrite
If save_as is different from 'none', rewrite or not existing data file (logical)
- allow_numeric_instead_integer
Allow columns that should be of type integer to be of type numeric (logical)
- allow_dttm_instead_date
Allow columns that should be of type datetime to be of type date (logical)
Details
This function is used within a dataset code and is invoked each time a user selects a dataset.
For each OMOP table you wish to import, you must create a function that, when called, loads the data from the specified table.
Then, utilize the import_dataset function to load data into the application.
Data can be loaded from several sources, including:
CSV files
Excel files
Parquet files
Local database connections
Remote database connections
Select the R library for reading the file using the read_with argument (options include vroom, duckdb, spark, or arrow). If read_with is set to "none", the data is loaded as is.
Choose the format for saving the data after import using the save_as argument (options are csv or parquet).
When loading data from a database, it's common to not save the data locally, in order to enhance application performance through partial data loading (lazy data reading).
If you wish to modify your data after loading, saving it locally may be beneficial to preserve your changes. In such cases, we recommend using the parquet storage format and loading the data with duckdb for efficient lazy reading.
The data you import must adhere to the OMOP common data model format. For more information, refer to the help pages in the app.
Examples
if (FALSE) {
person <- function() tibble::tibble(
person_id = 1:100,
gender_concept_id = sample(c(8507L, 8532L), 100, replace = TRUE),
year_of_birth = sample(1920:2010, 100, replace = TRUE),
month_of_birth = sample(1:12, 100, replace = TRUE),
day_of_birth = sample(1:28, 100, replace = TRUE),
race_concept_id = NA_integer_,
ethnicity_concept_id = NA_integer_,
location_id = sample(1:10, 100, replace = TRUE),
provider_id = sample(1:10, 100, replace = TRUE),
care_site_id = sample(1:10, 100, replace = TRUE),
person_source_value = paste("Source", 1:100),
gender_source_value = NA_character_,
gender_source_concept_id = NA_integer_,
race_source_value = NA_character_,
race_source_concept_id = NA_integer_,
ethnicity_source_value = NA_character_,
ethnicity_source_concept_id = NA_integer_
) %>%
dplyr::mutate(
birth_datetime = lubridate::ymd_hms(paste0(paste(year_of_birth, month_of_birth, day_of_birth, sep = "-"), " 00:00:00")),
death_datetime = dplyr::case_when(runif(100) < 2/3 ~ as.POSIXct(NA), TRUE ~ birth_datetime + lubridate::years(sample(30:80, 100, replace = TRUE))),
.after = "day_of_birth"
)
import_dataset(
data = person(), omop_table = "person", omop_version = "6.0", read_with = "none", save_as = "none", rewrite = FALSE,
output = output, ns = ns, i18n = i18n, r = r, d = d, dataset_id = 5,
)
cat("\n")
d$person %>% nrow() # n = 100
}