Why HBase Java Client is slow compared to REST/Thrift -
i running performance tests on hbase java client / thrift / rest interface. have table called “airline” has 500k rows. fetching 500k rows table through 4 different java programs. (using java client, thrift, thrift2 , rest)
following performance numbers various fetch sizes. these batch size set 100000
[table shows performance numbers. times in ms][1]
i see that, there performance improvement increase fetch size in case of rest, thrift, , thrift2.
but java api, seeing consistent performance, irrespective of fetch size. why fetch size not impacting in java client?
here snippet of java program
table table = conn.gettable(tablename.valueof("airline")); scan scan = new scan(); resultscanner scanner = table.getscanner(scan); (result[] result = scanner.next(fetchsize); result.length != 0; result = scanner.next(fetchsize))
{ - process rows }
can me in this. using wrong methods/classes data fetching through java client.
your scanner not set right fetch number of rows want in timely manner. in other words, you're tuning resultscanner, not thing doing scan, scan object.
i believe functions want partially following:
scan.setcaching scan.setcacheblocks
https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/scan.html
you call functions before loop...
source pig's hbasestorage#initscan function
Comments
Post a Comment