sorting - sort_array order by a different column, Hive -
i have 2 columns, 1 of products, , 1 of dates bought. able order dates applying sort_array(dates) function, want able sort_array(products) purchase date. there way in hive?
tablename is
clientid product date 100 shampoo 2016-01-02 101 book 2016-02-04 100 conditioner 2015-12-31 101 bookmark 2016-07-10 100 cream 2016-02-12 101 book2 2016-01-03
then, getting 1 row per customer:
select clientid, collect_list(product) prod_list, sort_array(collect_list(date)) date_order tablename group 1;
as:
clientid prod_list date_order 100 ["shampoo","conditioner","cream"] ["2015-12-31","2016-01-02","2016-02-12"] 101 ["book","bookmark","book2"] ["2016-01-03","2016-02-04","2016-07-10"]
but want order of products tied correct chronological order of purchases.
it possible using built-in functions, not pretty site :-)
select clientid ,split(regexp_replace(concat_ws(',',sort_array(collect_list(concat_ws(':',cast(date string),product)))),'[^:]*:([^,]*(,|$))','$1'),',') prod_list ,sort_array(collect_list(date)) date_order tablename group clientid ;
+----------+-----------------------------------+------------------------------------------+ | clientid | prod_list | date_order | +----------+-----------------------------------+------------------------------------------+ | 100 | ["conditioner","shampoo","cream"] | ["2015-12-31","2016-01-02","2016-02-12"] | | 101 | ["book2","book","bookmark"] | ["2016-01-03","2016-02-04","2016-07-10"] | +----------+-----------------------------------+------------------------------------------+
Comments
Post a Comment