python - How to write a custom aggregation function for strings? -
i have dataframe of millions of records, i'm trying make whole dataframe grouped 1 column 'napciente', done. there 63 columns need aggregate string based on specific match, example, if series contain "si" , other strings want return "si" result of aggregation.
so need define own aggregation finds string in series , returns it. here i'm posting data 1 group , truncated columns
data.groupby('npaciente')['asistencia'].apply(lambda x: if x.str.find("si"): return "si") the above invalid, suggestions?
you can use apply directly on groupby object, in custom function, return pd.series in order pandas refer columns:
def agg_func(group): """group dataframe containing relevant rows""" result = {} if group["asistencia"].str.find("si").any() result["asistencia"] = "si" return pd.series(result) data.groupby('npaciente').apply(agg_func) of course, need add more logic agg_func in order want do.

Comments
Post a Comment