python - Apply multiple functions at one time to Pandas groupby object -
variations of question have been asked (see this question) haven't found solution seem common use-case of groupby
in pandas.
say have dataframe lasts
, group user
:
lasts = pd.dataframe({'user':['a','s','d','d'], 'elapsed_time':[40000,50000,60000,90000], 'running_time':[30000,20000,30000,15000], 'num_cores':[7,8,9,4]})
and have these functions want apply groupby_obj
(what functions isn't important , made them up, know require multiple columns dataframe):
def custom_func(group): return group.running_time.median() - group.num_cores.mean() def custom_func2(group): return max(group.elapsed_time) -min(group.running_time)
i apply
each of these functions separately dataframe , merge resulting dataframes, seems inefficient, inelegant, , imagine there has one-line solution.
i haven't found one, although blog post (search "create function stats of group" towards bottom of page) suggested wrapping functions 1 function dictionary thusly:
def get_stats(group): return {'custom_column_1': custom_func(group), 'custom_column_2':custom_func2(group)}
however, when run code groupby_obj.apply(get_stats)
, instead of columns a column of dictionary results:
user {'custom_column_1': 29993.0, 'custom_column_2'... d {'custom_column_1': 22493.5, 'custom_column_2'... s {'custom_column_1': 19992.0, 'custom_column_2'... dtype: object
when in reality use line of code closer dataframe:
user custom_column_1 custom_column_2 29993.0 10000 d 22493.5 75000 s 19992.0 30000
suggestions on improving workflow?
if modify get_stats
function:
def get_stats(group): return pd.series({'custom_column_1': custom_func(group), 'custom_column_2':custom_func2(group)})
now can this:
in [202]: lasts.groupby('user').apply(get_stats).reset_index() out[202]: user custom_column_1 custom_column_2 0 29993.0 10000.0 1 d 22493.5 75000.0 2 s 19992.0 30000.0
alternative (bit ugly) approach uses functions (unchanged):
in [188]: pd.dataframe(lasts.groupby('user') .apply(get_stats).to_dict()) \ .t \ .rename_axis('user') \ .reset_index() out[188]: user custom_column_1 custom_column_2 0 29993.0 10000.0 1 d 22493.5 75000.0 2 s 19992.0 30000.0
Comments
Post a Comment