ehrdata.io.to_pandas

Contents

ehrdata.io.to_pandas#

ehrdata.io.to_pandas(edata, *, layer=None, obs_cols=None, var_col=None, format='wide')#

Transform an EHRData object to a DataFrame.

Parameters:
edata EHRData

Central data object.

layer str | None (default: None)

The layer to access the values of. If not specified, uses X.

obs_cols Iterable[str] | None (default: None)

The columns of obs to add to the dataframe.

var_col str | None (default: None)

The column of var to create the column names from in the created dataframe. If not specified, the var_names will be used.

format Literal['wide', 'long'] (default: 'wide')

The format of the output dataframe. This is relevant for longitudinal data. If "wide", the output dataframe will write a column for each (variable, time) tuple, naming the column as <variable_name>_t_<tem.index value>. If "long", the output dataframe will be in long format, with columns "observation_id", "variable", "time", and "value".

Return type:

DataFrame

Examples

>>> import ehrdata as ed
>>> edata = ed.dt.ehrdata_blobs(n_observations=2, n_variables=2, base_timepoints=3)
>>> edata
>>> EHRData object with n_obs × n_vars × n_t = 2 × 2 × 3
>>> obs: "cluster"
>>> tem: '0', '1', '2'
>>> shape of .X: (2, 2)
>>> shape of .R: (2, 2, 3)
>>> df_wide = ed.io.to_pandas(edata, format="wide")
>>> df_wide

feature_0_t_0

feature_0_t_1

feature_0_t_2

feature_1_t_0

feature_1_t_1

feature_1_t_2

0

3.060372

3.827524

4.680650

-1.697623

-1.816282

-2.775774

1

-3.395852

-4.948999

-5.401154

-7.347151

-9.427101

-11.793235

>>> df_long = ed.io.to_pandas(edata, format="long")
>>> df_long

observation_id

variable

time

value

0

feature_0

0

3.060372

0

feature_0

1

3.827524

0

feature_0

2

4.680650

0

feature_1

0

-1.697623

0

feature_1

1

-1.816282

0

feature_1

2

-2.775774

1

feature_0

0

-3.395852

1

feature_0

1

-4.948999

1

feature_0

2

-5.401154

1

feature_1

0

-7.347151

1

feature_1

1

-9.427101

1

feature_1

2

-11.793235