python - Pandas/Numpy: Fastest way to create a ladder? -


i have pandas dataframe like:

    color     cost    temp 0   blue      12.0    80.4    1    red       8.1    81.2  2   pink      24.5    83.5 

and want create "ladder" or "range" of costs every row @ 50 cent increments, $0.50 below current cost $0.50 above current cost. current code similar follow:

incremented_prices = []  df['original_idx'] = df.index # know it's original label  row in df.iterrows():     current_price = row['cost']     more_costs    = numpy.arange(current_price-1, current_price+1, step=0.5)      cost in more_costs:         row_c = row.copy()         row_c['cost'] = cost         incremented_prices.append(row_c)  df_incremented = pandas.concat(incremented_prices) 

and code produce dataframe like:

    color     cost    temp  original_idx 0   blue      11.5    80.4            0 1   blue      12.0    80.4            0  2   blue      12.5    80.4            0   3    red       7.6    81.2            1  4    red       8.1    81.2            1  5    red       8.6    81.2            1  6   pink      24.0    83.5            2 7   pink      24.5    83.5            2 8   pink      25.0    83.5            2 

in real problem, make ranges -$50.00 $50.00 , find slow, there faster vectorized way?

you can try recreate data frame numpy.repeat:

cost_steps = pd.np.arange(-0.5, 0.51, 0.5) repeats = cost_steps.size     pd.dataframe(dict(     color = pd.np.repeat(df.color.values, repeats),     # here vectorized method calculate costs steps added broadcasting     cost = (df.cost.values[:, none] + cost_steps).ravel(),     temp = pd.np.repeat(df.temp.values, repeats),     original_idx = pd.np.repeat(df.index.values, repeats)     )) 

enter image description here

update more columns:

df1 = df.rename_axis("original_idx").reset_index() cost_steps = pd.np.arange(-0.5, 0.51, 0.5) repeats = cost_steps.size     pd.dataframe(pd.np.hstack((pd.np.repeat(df1.drop("cost", 1).values, repeats, axis=0),                           (df1.cost[:, none] + cost_steps).reshape(-1, 1))),              columns=df1.columns.drop("cost").tolist()+["cost"]) 

enter image description here


Comments

Popular posts from this blog

php - Permission denied. Laravel linux server -

google bigquery - Delta between query execution time and Java query call to finish -

python - Pandas two dataframes multiplication? -