hiveql - Hive multiple count (with and without DISTINCT) generate bad output -


i tried hive query

select id,count(distinct case when unix_timestamp(m_date) between unix_timestamp(cast(date_sub(cast('2017-02-01' date),60) date)) , unix_timestamp(cast('2017-02-01' date)) m_date else 0 end)           ,count(case when unix_timestamp(m_date) between unix_timestamp(cast(date_sub(cast('2017-02-01' date),60) date)) , unix_timestamp(cast('2017-02-01' date)) m_date else 0 end)  db.table2 group id limit 10; 

and gives me smthg like:

111007001007633 1       1 111007001029793 1       1 111007001000521 1       11 111007001000794 1       1 111007001000273 3       13 111007001001032 1       1 111007001025874 1       4 111007001001792 1       7 111007001029181 1       1 111007001000141 16      96 

but when add other count:

 select id,count(distinct case when unix_timestamp(m_date) between unix_timestamp(cast(date_sub(cast('2017-02-01' date),60) date)) , unix_timestamp(cast('2017-02-01' date)) m_date else 0 end)           ,count(case when unix_timestamp(m_date) between unix_timestamp(cast(date_sub(cast('2017-02-01' date),60) date)) , unix_timestamp(cast('2017-02-01' date)) m_date else 0 end)           ,count(distinct case when unix_timestamp(m_date) between unix_timestamp(cast(date_sub(cast('2017-02-01' date),15) date)) , unix_timestamp(cast('2017-02-01' date)) m_date else 0 end)           ,count(case when unix_timestamp(m_date) between unix_timestamp(cast(date_sub(cast('2017-02-01' date),15) date)) , unix_timestamp(cast('2017-02-01' date)) m_date else 0 end)   db.table2 group id limit 10; 

it returns bad this:

 111007001010439 0       0       1       0  111007001026963 0       0       1       0  111007001028001 0       0       1       0  111007001032987 0       0       1       0  111007001048710 0       0       1       0  111007001052415 0       0       1       0  111007002008374 0       0       1       0  111007003000644 0       0       1       0  111007003002210 0       0       1       0 

i work on hadoop cluster , don't if it's caused map reduce.

thanks

[edit]

as answered @pashaz comment, first problem results 2 same queries (with , without distinct) give 1 distinct , 0 non distinct.

the second problem results between 2 distinct queries , 2 non-distinct queries. if check timestamps you'll see first queries contains seconds because 2 firsts count occurrences between "2017-02-01" , 60 days before, 2nds count occurrences between "2017-02-01" , 15 days before.

[update]

if put clause works

 id="111007001007633" or id="271011604404359" or id="122213250512607" or id="111007001033217"   111007001033217 0       0       0       0       0       0 122213250512607 1       3       8       14      0       0 271011604404359 12      21      26      42      5       9 111007001007633 14      19      24      34      5       5 

the limit clause seems problem.

nothing bad in provided results. in both queries present "limit 10". there no garantee same id returned.

in first query results present "111007001007633" absent in second query.


Comments

Popular posts from this blog

php - Permission denied. Laravel linux server -

google bigquery - Delta between query execution time and Java query call to finish -

python - Pandas two dataframes multiplication? -