ehrdata.io.omop.setup_interval_variables#
- ehrdata.io.omop.setup_interval_variables(edata, *, backend_handle, layer=None, data_tables, data_field_to_keep, time_precision='date', interval_length_number, interval_length_unit, num_intervals, concept_ids='all', aggregation_strategy='last', enrich_var_with_feature_info=False, keep_date='start', instantiate_tensor=True)#
Extracts selected tables of a time-span character from the OMOP CDM.
The distinct
concept_ids encountered in the selected tables form the variables in the EHRData object. The variables are sorted by theconcept_idfor eachdata_tablein ascending order, and stacked together in the order that thedata_tablesare specified. Thedata_field_to_keepparameter specifies which Field in the selected table is to be used for the read-out of the value of a variable.In contrast to
setup_variables, tables without unit unformation can be present here. Hence, this function will not verify that a single unit per feature (=`concept_id`) is used. Also, it will not write a unit report. Should this be relevant for your work, please do open an issue on theislab/ehrdata.Stores a table(s) named
long_person_timestamp_feature_value_<data_table>in long format in the RDBMS. This table is instantiated intoedata.rifinstantiate_tensoris set toTrue; otherwise, the table is only stored in the RDBMS for later use.- Parameters:
- edata
Data object to which the variables should be added.
- backend_handle
DuckDBPyConnection The backend handle to the database.
- layer
str|None(default:None) The layer to store the data in. If not specified, it uses
X.- data_tables
Sequence[Literal['drug_exposure','condition_occurrence','procedure_occurrence','device_exposure','drug_era','dose_era','condition_era','episode']] |Literal['drug_exposure','condition_occurrence','procedure_occurrence','device_exposure','drug_era','dose_era','condition_era','episode'] The tables to be used.
- data_field_to_keep
str|Sequence[str] |dict[str,str|Sequence[str]] The CDM Field in the data tables to be kept. Can be e.g. ‘value_as_number’ or ‘value_as_concept_id’. Importantly, can be ‘is_present’ to have a one-hot encoding of the presence of the feature in a patient in an interval. Should be a dictionary to specify the data fields to keep per table if multiple data tables are used. For example, if data_tables=[‘measurement’, ‘observation’], data_field_to_keep={‘measurement’: ‘value_as_number’, ‘observation’: ‘value_as_number’}.
- time_precision
Literal['date','datetime'] (default:'date') The precision of the timestamp used in the table indicated in
setup_obs(). If"date", uses thedatefield (e.g.visit_start_datefor"person_visit_occurrence"). If"datetime", uses thedatetimefield (e.g.visit_start_datetimefor"person_visit_occurrence").- interval_length_number
int Numeric value of the length of one interval.
- interval_length_unit
str Unit of the interval length, needs to be a unit of
pandas.Timedelta.- num_intervals
int Number of intervals.
- concept_ids
Literal['all'] |Sequence[int] (default:'all') Concept IDs to use from the data tables. If not specified, ‘all’ are used.
- aggregation_strategy
Literal['last','first','mean','median','mode','sum','count','min','max','std'] (default:'last') Strategy to use when aggregating multiple data points within one interval.
- enrich_var_with_feature_info
bool(default:False) Whether to enrich the var table with feature information. If a concept_id is not found in the concept table, their respective alternate
concept_idincluded in the concept_relationship table is retrieved to add the available feature information. Otherwise the feature information will be NaN.- keep_date
Literal['start','end','interval'] (default:'start') Whether to keep the start or end date, or the interval span.
- instantiate_tensor
bool(default:True) Whether to instantiate the tensor into the .r field of the EHRData object.
- Returns:
An EHRData object with fields.
Examples
>>> import ehrdata as ed >>> import duckdb >>> con_gi = duckdb.connect(database=":memory:", read_only=False) >>> ed.dt.gibleed_omop( ... con_gi, ... ) >>> edata_gi = ed.io.omop.setup_obs( >>> con_gi, >>> observation_table="person_observation_period", >>> ) >>> edata_gi = ed.io.omop.setup_interval_variables( >>> edata=edata_gi, >>> backend_handle=con_gi, >>> layer="tem_data", >>> data_tables=["drug_exposure", "condition_occurrence"], >>> data_field_to_keep={"drug_exposure": "is_present", "condition_occurrence": "is_present"}, >>> interval_length_number=20, >>> interval_length_unit="day", >>> num_intervals=20, >>> concept_ids="all", >>> aggregation_strategy="last", >>> enrich_var_with_feature_info=True, >>> ) >>> edata_gi