ehrdata.dt.physionet2012#
- ehrdata.dt.physionet2012(data_path=None, *, interval_length_number=1, interval_length_unit='h', num_intervals=48, aggregation_strategy='last', drop_samples=['147514', '142731', '145611', '140501', '155655', '143656', '156254', '150309', '140936', '141264', '150649', '142998'], layer=None)#
Loads the dataset of the PhysioNet challenge 2012 (v1.0.0).
This dataset was designed to encourage the development of algorithms for mortality rate prediction using physiological data [SMS+12] [GAG+00].
If
interval_length_numberis 1,interval_length_unitis"h"(hour), andnum_intervalsis 48, this is the same as the SAITS preprocessing [DCoteL23]. Truncated if a sample has morenum_intervalssteps; Padded if a sample has less thannum_intervalssteps. Further, by default the following 12 samples are dropped since they have no time series information at all: 147514, 142731, 145611, 140501, 155655, 143656, 156254, 150309, 140936, 141264, 150649, 142998. Taken the defaults ofinterval_length_number,interval_length_unit,num_intervals, anddrop_samples, the tensor stored in.layers[layer_name]ofedatais the same as when doing the PyPOTS preprocessing [Du23]. A simple deviation is that the tensor inehrdatais of shapen_obs x n_vars x n_intervals(with defaults, 3000x37x48) while the tensor in PyPOTS is of shapen_obs x n_intervals x n_vars(3000x48x37). The tensor stored in.layers[layer_name]is hence also fully compatible with the PyPOTS package, as the.layersfield of EHRData objects generally is. Note: In the original dataset, some missing values are encoded with a -1 for some entries of the variables'DiasABP','NIDiasABP', and'Weight'. Here, these are replaced withNaNs.- Parameters:
- data_path
Path|str|None(default:None) Path to the raw data. If the path exists, the data is loaded from there. Else, the data is downloaded.
- interval_length_number
int(default:1) Numeric value of the length of one interval.
- interval_length_unit
str(default:'h') Unit belonging to the interval length.
- num_intervals
int(default:48) Number of intervals.
- aggregation_strategy
str(default:'last') Aggregation strategy for the time series data when multiple measurements for a person’s parameter within a time interval is available. Available are
'first'and'last', as used indrop_duplicates().- drop_samples
Iterable[str] |None(default:['147514', '142731', '145611', '140501', '155655', '143656', '156254', '150309', '140936', '141264', '150649', '142998']) Samples to drop from the dataset (indicate their RecordID).
- layer
str|None(default:None) Name of the layer in the EHRData object that will store the time series data. If not specified, it uses
X.
- data_path
- Return type:
- Returns:
The processed physionet2012 dataset. The raw data is also downloaded, stored and available under the
data_path.
Examples
>>> import ehrdata as ed >>> edata = ed.dt.physionet_2012(layer="tem_data) EHRData object with n_obs × n_vars × n_t = 11988 × 37 × 48 obs: 'set', 'Age', 'Gender', 'Height', 'ICUType', 'SAPS-I', 'SOFA', 'Length_of_stay', 'Survival', 'In-hospital_death' var: 'Parameter' tem: '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12', '13', '14', '15', '16', '17', '18', '19', '20', '21', '22', '23', '24', '25', '26', '27', '28', '29', '30', '31', '32', '33', '34', '35', '36', '37', '38', '39', '40', '41', '42', '43', '44', '45', '46', '47' layers: 'tem_data' shape of .tem_data: (11988, 37, 48)
Inspect static information
>>> edata.obs.head() set Age Gender Height ICUType SAPS-I SOFA Length_of_stay Survival In-hospital_death RecordID 132539 set-a 54.0 0.0 -1.0 4.0 6 1 5 -1 0 132540 set-a 76.0 1.0 175.3 2.0 16 8 8 -1 0 132541 set-a 44.0 0.0 -1.0 3.0 21 11 19 -1 0 132543 set-a 68.0 1.0 180.3 3.0 7 1 9 575 0 132545 set-a 88.0 0.0 -1.0 3.0 17 2 4 918 0
Inspect the 48-hour trajectory of the variable
RespRate:>>> edata[edata.obs.index == "132539", edata.var_names == "RespRate"].layers["tem_data"] [[[19., 18., 19., 20., 20., 17., nan, 15., 14., 17., 15., 15., 12., 15., 15., 12., 14., 13., 18., 13., 12., 20., 15., 24., nan, 16., 19., 18., nan, 16., nan, 18., nan, 18., nan, 20., nan, 24., 21., 16., 18., 14., 23., 17., 20., 20., 20., 23.]]]