sorting - sort_array order by a different column, Hive -


i have 2 columns, 1 of products, , 1 of dates bought. able order dates applying sort_array(dates) function, want able sort_array(products) purchase date. there way in hive?

tablename is

clientid    product    date 100    shampoo    2016-01-02 101    book    2016-02-04 100    conditioner    2015-12-31 101    bookmark    2016-07-10 100    cream    2016-02-12 101    book2    2016-01-03 

then, getting 1 row per customer:

select clientid, collect_list(product) prod_list, sort_array(collect_list(date)) date_order tablename group 1; 

as:

clientid    prod_list    date_order 100    ["shampoo","conditioner","cream"]    ["2015-12-31","2016-01-02","2016-02-12"] 101    ["book","bookmark","book2"]    ["2016-01-03","2016-02-04","2016-07-10"] 

but want order of products tied correct chronological order of purchases.

it possible using built-in functions, not pretty site :-)

select      clientid            ,split(regexp_replace(concat_ws(',',sort_array(collect_list(concat_ws(':',cast(date string),product)))),'[^:]*:([^,]*(,|$))','$1'),',') prod_list            ,sort_array(collect_list(date)) date_order         tablename   group    clientid ;  

+----------+-----------------------------------+------------------------------------------+ | clientid |             prod_list             |                date_order                | +----------+-----------------------------------+------------------------------------------+ |      100 | ["conditioner","shampoo","cream"] | ["2015-12-31","2016-01-02","2016-02-12"] | |      101 | ["book2","book","bookmark"]       | ["2016-01-03","2016-02-04","2016-07-10"] | +----------+-----------------------------------+------------------------------------------+ 

Comments

Popular posts from this blog

php - Permission denied. Laravel linux server -

google bigquery - Delta between query execution time and Java query call to finish -

python - Pandas two dataframes multiplication? -