python - How to write a custom aggregation function for strings? -
i have dataframe of millions of records, i'm trying make whole dataframe grouped 1 column 'napciente', done. there 63 columns need aggregate string based on specific match, example, if series contain "si" , other strings want return "si" result of aggregation.
so need define own aggregation finds string in series , returns it. here i'm posting data 1 group , truncated columns
data.groupby('npaciente')['asistencia'].apply(lambda x: if x.str.find("si"): return "si")
the above invalid, suggestions?
you can use apply
directly on groupby
object, in custom function, return pd.series
in order pandas refer columns:
def agg_func(group): """group dataframe containing relevant rows""" result = {} if group["asistencia"].str.find("si").any() result["asistencia"] = "si" return pd.series(result) data.groupby('npaciente').apply(agg_func)
of course, need add more logic agg_func
in order want do.
Comments
Post a Comment