ehrdata.infer_feature_types

ehrdata.infer_feature_types#

ehrdata.infer_feature_types(edata, *, layer=None, output='tree', binary_as='categorical', verbose=True)#

Infer feature types of an EHRData object.

For each feature in edata.var_names, the method infers one of the following types: 'date', 'categorical', or 'numeric'. The inferred types are stored in edata.var['feature_type']. Please check the inferred types and adjust if necessary using edata.var['feature_type']['feature1']='corrected_type' or with replace_feature_types(). Be aware that not all features stored numerically are of 'numeric' type, as categorical features might be stored in a numerically encoded format. For example, a feature with values [0, 1, 2] might be a categorical feature with three categories. This is accounted for in the method, but it is recommended to check the inferred types.

Parameters:
edata EHRData

Data object.

layer str | None (default: None)

The layer to use from the EHRData object. If None, the X field is used.

output Literal['tree', 'dataframe'] | None (default: 'tree')

The output format. Choose between 'tree', 'dataframe', or None. If 'tree', the feature types will be printed to the console in a tree format. If 'dataframe', a DataFrame with the feature types will be returned. If None, nothing will be returned.

binary_as Literal['categorical', 'numeric'] (default: 'categorical')

How to classify binary features with values 0 and 1. If 'categorical' (default), binary features are classified as categorical. If 'numeric', binary features are classified as numeric.

verbose bool (default: True)

Whether to print warnings for uncertain feature types.

Return type:

DataFrame | None

Examples

>>> import ehrdata as ed
>>> edata = ed.dt.mimic_2()
>>> ed.infer_feature_types(edata)