ehrdata.io.read_csv

Contents

ehrdata.io.read_csv#

ehrdata.io.read_csv(filename, *, layer=None, sep=',', index_column=None, columns_obs_only=None, format='flat', wide_format_time_suffix=None, long_format_keys=None, **kwargs)#

Read a comma-separated values (csv) file into an EHRData object.

It first reads the csv file using pandas.read_csv(), and then passes the resulting DataFrame to ehrdata.io.from_pandas(). See the documentation of ehrdata.io.from_pandas() for more details of table layouts.

Parameters:
filename Path | str

Path to the file or directory to read. Delegates to pandas.read_csv().

layer str | None (default: None)

The layer to store the data in. If not specified, it uses the X. Delegates to from_pandas().

sep str (default: ',')

Separator in the file. Delegates to pandas.read_csv().

index_column str | None (default: None)

If specified, this column of the csv file will be used for the .obs dataframe. Delegates to from_pandas().

columns_obs_only Iterable[str] | None (default: None)

These columns will be added to the .obs dataframe only. Delegates to from_pandas().

format Literal['flat', 'wide', 'long'] (default: 'flat')

The format of the input dataframe. If the data is not longitudinal, choose format="flat". If the data is longitudinal in the long format, choose format="long". If the data is longitudinal in a wide format, choose format="wide". Delegates to from_pandas().

wide_format_time_suffix str | None (default: None)

Use only if format="wide". Suffices in the variable columns that indicate the time of the observation. The collected suffices will be sorted lexicographically, and the variables ordered accordingly along the 3rd axis of the EHRData object. Delegates to from_pandas().

long_format_keys dict[Literal['observation_column', 'variable_column', 'time_column', 'value_column'], str] | None (default: None)

Use only if format="long". The keys of the dataframe in the long format. The dictionary should have the following structure: {“observation_column”: “<the column name of the observation ids>”, “variable_column”: “<the column name of the variable ids>”, “time_column”: “<the column name of the time>”, “value_column”: “<the column name of the values>”}. Delegates to from_pandas().

**kwargs

Passed to pandas.read_csv().

Return type:

EHRData

Examples

>>> import ehrdata as ed
>>> edata = ed.io.read_csv("myfile.csv")