ehrdata.infer_feature_types#

ehrdata.infer_feature_types(edata, *, layer=None, output='tree', binary_as='categorical', verbose=True)#

Infer feature types of an EHRData object.

For each feature in edata.var_names, the method infers one of the following types: 'date', 'categorical', or 'numeric'. The inferred types are stored in edata.var['feature_type']. Please check the inferred types and adjust if necessary using edata.var['feature_type']['feature1']='corrected_type' or with replace_feature_types(). Be aware that not all features stored numerically are of 'numeric' type, as categorical features might be stored in a numerically encoded format. For example, a feature with values [0, 1, 2] might be a categorical feature with three categories. This is accounted for in the method, but it is recommended to check the inferred types.

Parameters:

edata EHRData: Data object.
layer str | None (default: None): The layer to use from the EHRData object. If None, the X field is used.
output Literal['tree', 'dataframe'] | None (default: 'tree'): The output format. Choose between 'tree', 'dataframe', or None. If 'tree', the feature types will be printed to the console in a tree format. If 'dataframe', a DataFrame with the feature types will be returned. If None, nothing will be returned.
binary_as Literal['categorical', 'numeric'] (default: 'categorical'): How to classify binary features with values 0 and 1. If 'categorical' (default), binary features are classified as categorical. If 'numeric', binary features are classified as numeric.
verbose bool (default: True): Whether to print warnings for uncertain feature types.

Return type:

DataFrame | None

Examples

>>> import ehrdata as ed
>>> edata = ed.dt.mimic_2()
>>> ed.infer_feature_types(edata)

ehrdata.infer_feature_types

Contents

ehrdata.infer_feature_types#