python - Apply multiple functions at one time to Pandas groupby object -


variations of question have been asked (see this question) haven't found solution seem common use-case of groupby in pandas.

say have dataframe lasts , group user:

lasts = pd.dataframe({'user':['a','s','d','d'],                    'elapsed_time':[40000,50000,60000,90000],                    'running_time':[30000,20000,30000,15000],                    'num_cores':[7,8,9,4]}) 

and have these functions want apply groupby_obj (what functions isn't important , made them up, know require multiple columns dataframe):

def custom_func(group):     return group.running_time.median() - group.num_cores.mean()  def custom_func2(group):     return max(group.elapsed_time) -min(group.running_time)  

i apply each of these functions separately dataframe , merge resulting dataframes, seems inefficient, inelegant, , imagine there has one-line solution.

i haven't found one, although blog post (search "create function stats of group" towards bottom of page) suggested wrapping functions 1 function dictionary thusly:

def get_stats(group):     return {'custom_column_1': custom_func(group), 'custom_column_2':custom_func2(group)} 

however, when run code groupby_obj.apply(get_stats), instead of columns a column of dictionary results:

user    {'custom_column_1': 29993.0, 'custom_column_2'... d    {'custom_column_1': 22493.5, 'custom_column_2'... s    {'custom_column_1': 19992.0, 'custom_column_2'... dtype: object 

when in reality use line of code closer dataframe:

user custom_column_1    custom_column_2    29993.0                10000 d    22493.5                75000 s    19992.0                30000 

suggestions on improving workflow?

if modify get_stats function:

def get_stats(group):     return pd.series({'custom_column_1': custom_func(group),                       'custom_column_2':custom_func2(group)}) 

now can this:

in [202]: lasts.groupby('user').apply(get_stats).reset_index() out[202]:   user  custom_column_1  custom_column_2 0             29993.0          10000.0 1    d          22493.5          75000.0 2    s          19992.0          30000.0 

alternative (bit ugly) approach uses functions (unchanged):

in [188]: pd.dataframe(lasts.groupby('user')                             .apply(get_stats).to_dict()) \             .t \             .rename_axis('user') \             .reset_index() out[188]:   user  custom_column_1  custom_column_2 0             29993.0          10000.0 1    d          22493.5          75000.0 2    s          19992.0          30000.0 

Comments

Popular posts from this blog

php - Permission denied. Laravel linux server -

google bigquery - Delta between query execution time and Java query call to finish -

python - Pandas two dataframes multiplication? -