hiveql - Hive multiple count (with and without DISTINCT) generate bad output -

i tried hive query

select id,count(distinct case when unix_timestamp(m_date) between unix_timestamp(cast(date_sub(cast('2017-02-01' date),60) date)) , unix_timestamp(cast('2017-02-01' date)) m_date else 0 end)           ,count(case when unix_timestamp(m_date) between unix_timestamp(cast(date_sub(cast('2017-02-01' date),60) date)) , unix_timestamp(cast('2017-02-01' date)) m_date else 0 end)  db.table2 group id limit 10;

and gives me smthg like:

111007001007633 1       1 111007001029793 1       1 111007001000521 1       11 111007001000794 1       1 111007001000273 3       13 111007001001032 1       1 111007001025874 1       4 111007001001792 1       7 111007001029181 1       1 111007001000141 16      96

but when add other count:

 select id,count(distinct case when unix_timestamp(m_date) between unix_timestamp(cast(date_sub(cast('2017-02-01' date),60) date)) , unix_timestamp(cast('2017-02-01' date)) m_date else 0 end)           ,count(case when unix_timestamp(m_date) between unix_timestamp(cast(date_sub(cast('2017-02-01' date),60) date)) , unix_timestamp(cast('2017-02-01' date)) m_date else 0 end)           ,count(distinct case when unix_timestamp(m_date) between unix_timestamp(cast(date_sub(cast('2017-02-01' date),15) date)) , unix_timestamp(cast('2017-02-01' date)) m_date else 0 end)           ,count(case when unix_timestamp(m_date) between unix_timestamp(cast(date_sub(cast('2017-02-01' date),15) date)) , unix_timestamp(cast('2017-02-01' date)) m_date else 0 end)   db.table2 group id limit 10;

it returns bad this:

 111007001010439 0       0       1       0  111007001026963 0       0       1       0  111007001028001 0       0       1       0  111007001032987 0       0       1       0  111007001048710 0       0       1       0  111007001052415 0       0       1       0  111007002008374 0       0       1       0  111007003000644 0       0       1       0  111007003002210 0       0       1       0

i work on hadoop cluster , don't if it's caused map reduce.

thanks

[edit]

as answered @pashaz comment, first problem results 2 same queries (with , without distinct) give 1 distinct , 0 non distinct.

the second problem results between 2 distinct queries , 2 non-distinct queries. if check timestamps you'll see first queries contains seconds because 2 firsts count occurrences between "2017-02-01" , 60 days before, 2nds count occurrences between "2017-02-01" , 15 days before.

[update]

if put clause works

 id="111007001007633" or id="271011604404359" or id="122213250512607" or id="111007001033217"   111007001033217 0       0       0       0       0       0 122213250512607 1       3       8       14      0       0 271011604404359 12      21      26      42      5       9 111007001007633 14      19      24      34      5       5

the limit clause seems problem.

nothing bad in provided results. in both queries present "limit 10". there no garantee same id returned.

in first query results present "111007001007633" absent in second query.

Search This Blog

New Generation Education

hiveql - Hive multiple count (with and without DISTINCT) generate bad output -

Comments

Post a Comment

Popular posts from this blog

cookies - Yii2 Advanced - Share session between frontend and mainsite (duplicate of frontend for www) -

angular - password and confirm password field validation angular2 reactive forms -

javascript - Angular2 intelliJ config error.. Cannot find module '@angular/core' -