ehrdata.io.to_pandas#

ehrdata.io.to_pandas(edata, *, layer=None, obs_cols=None, var_col=None, format='wide')#

Transform an EHRData object to a DataFrame.

Parameters:

edata EHRData: Central data object.
layer str | None (default: None): The layer to access the values of. If not specified, uses X.
obs_cols Iterable[str] | None (default: None): The columns of obs to add to the dataframe.
var_col str | None (default: None): The column of var to create the column names from in the created dataframe. If not specified, the var_names will be used.
format Literal['wide', 'long'] (default: 'wide'): The format of the output dataframe. This is relevant for longitudinal data. If "wide", the output dataframe will write a column for each (variable, time) tuple, naming the column as <variable_name>_t_<tem.index value>. If "long", the output dataframe will be in long format, with columns "observation_id", "variable", "time", and "value".

Return type:

DataFrame

Examples

>>> import ehrdata as ed
>>> edata = ed.dt.ehrdata_blobs(n_observations=2, n_variables=2, base_timepoints=3)
>>> edata

>>> EHRData object with n_obs × n_vars × n_t = 2 × 2 × 3
>>> obs: "cluster"
>>> tem: '0', '1', '2'
>>> shape of .X: (2, 2)
>>> shape of .R: (2, 2, 3)

>>> df_wide = ed.io.to_pandas(edata, format="wide")
>>> df_wide

	feature_0_t_0	feature_0_t_1	feature_0_t_2	feature_1_t_0	feature_1_t_1	feature_1_t_2
0	3.060372	3.827524	4.680650	-1.697623	-1.816282	-2.775774
1	-3.395852	-4.948999	-5.401154	-7.347151	-9.427101	-11.793235

>>> df_long = ed.io.to_pandas(edata, format="long")
>>> df_long

observation_id	variable	time	value
0	feature_0	0	3.060372
0	feature_0	1	3.827524
0	feature_0	2	4.680650
0	feature_1	0	-1.697623
0	feature_1	1	-1.816282
0	feature_1	2	-2.775774
1	feature_0	0	-3.395852
1	feature_0	1	-4.948999
1	feature_0	2	-5.401154
1	feature_1	0	-7.347151
1	feature_1	1	-9.427101
1	feature_1	2	-11.793235

ehrdata.io.to_pandas