ehrdata.integrations.torch.OMOPEHRDataset

ehrdata.integrations.torch.OMOPEHRDataset#

class ehrdata.integrations.torch.OMOPEHRDataset(con, edata, *, data_tables, target='mortality', datetime=True, idxs=None)#

A Dataset built from an OMOP CDM database.

This class is a Dataset from an OMOP CDM database. It is a Dataset structure for the tensor in ehrdata.R, in a suitable format for DataLoader. This allows to stream the data in batches from the RDBMS, not requiring to load the entire dataset in memory.

Note: Each item in the dataset represents an observation unit (e.g., a visit, observation period, or person), not necessarily a unique patient. A single patient can have multiple observation units.

Parameters:
con DuckDBPyConnection

The connection to the database.

edata EHRData

Central data object.

data_tables Sequence[Literal['measurement', 'observation', 'specimen']]

The OMOP data tables to extract.

target Literal['mortality'] (default: 'mortality')

The target variable to be used.

datetime bool (default: True)

If True, use datetime, if False, use date.

idxs Sequence[int] | None (default: None)

The indices of the observation units to be used, can be used to include only a subset of the data, for e.g. train-test splits.

Methods table#

__getitem__(obs_index)

Methods#

OMOPEHRDataset.__getitem__(obs_index)#