python - How to write a custom aggregation function for strings? -


i have dataframe of millions of records, i'm trying make whole dataframe grouped 1 column 'napciente', done. there 63 columns need aggregate string based on specific match, example, if series contain "si" , other strings want return "si" result of aggregation.

enter image description here

so need define own aggregation finds string in series , returns it. here i'm posting data 1 group , truncated columns

data.groupby('npaciente')['asistencia'].apply(lambda x: if x.str.find("si"): return "si")  

the above invalid, suggestions?

you can use apply directly on groupby object, in custom function, return pd.series in order pandas refer columns:

def agg_func(group):     """group dataframe containing relevant rows"""     result = {}     if group["asistencia"].str.find("si").any()         result["asistencia"] = "si"     return pd.series(result)  data.groupby('npaciente').apply(agg_func) 

of course, need add more logic agg_func in order want do.


Comments

Popular posts from this blog

php - Permission denied. Laravel linux server -

google bigquery - Delta between query execution time and Java query call to finish -

python - Pandas two dataframes multiplication? -