python - Pandas/Numpy: Fastest way to create a ladder? -
i have pandas dataframe like:
color cost temp 0 blue 12.0 80.4 1 red 8.1 81.2 2 pink 24.5 83.5
and want create "ladder" or "range" of costs every row @ 50 cent increments, $0.50 below current cost $0.50 above current cost. current code similar follow:
incremented_prices = [] df['original_idx'] = df.index # know it's original label row in df.iterrows(): current_price = row['cost'] more_costs = numpy.arange(current_price-1, current_price+1, step=0.5) cost in more_costs: row_c = row.copy() row_c['cost'] = cost incremented_prices.append(row_c) df_incremented = pandas.concat(incremented_prices)
and code produce dataframe like:
color cost temp original_idx 0 blue 11.5 80.4 0 1 blue 12.0 80.4 0 2 blue 12.5 80.4 0 3 red 7.6 81.2 1 4 red 8.1 81.2 1 5 red 8.6 81.2 1 6 pink 24.0 83.5 2 7 pink 24.5 83.5 2 8 pink 25.0 83.5 2
in real problem, make ranges -$50.00 $50.00 , find slow, there faster vectorized way?
you can try recreate data frame numpy.repeat
:
cost_steps = pd.np.arange(-0.5, 0.51, 0.5) repeats = cost_steps.size pd.dataframe(dict( color = pd.np.repeat(df.color.values, repeats), # here vectorized method calculate costs steps added broadcasting cost = (df.cost.values[:, none] + cost_steps).ravel(), temp = pd.np.repeat(df.temp.values, repeats), original_idx = pd.np.repeat(df.index.values, repeats) ))
update more columns:
df1 = df.rename_axis("original_idx").reset_index() cost_steps = pd.np.arange(-0.5, 0.51, 0.5) repeats = cost_steps.size pd.dataframe(pd.np.hstack((pd.np.repeat(df1.drop("cost", 1).values, repeats, axis=0), (df1.cost[:, none] + cost_steps).reshape(-1, 1))), columns=df1.columns.drop("cost").tolist()+["cost"])
Comments
Post a Comment