google bigquery - Delta between query execution time and Java query call to finish -


context

  • our container cluster located @ us-east1-c
  • we using following java library: google-cloud-bigquery, 0.9.2-beta
  • our dataset has around 26m rows , represents ~10g
  • all of our queries return less 100 rows grouping on specific column

question

we analyzed last 100 queries executed in bigquery, these executed in 2-3 seconds (we analyzed calling bq --format=prettyjson show -j jobid, end time - creation time).

in our java logs though, of calls bigquery.query blocking 5-6 seconds (and 10 seconds not out of ordinary). explain systematic gap between query finish in bigquery cluster , results being available in java? know 5-6 seconds isn't astronomic, curious see if normal behaviour when using java bigquery cloud library.

i didn't dig point analyzed outbound call using wireshark. our tests executed in our container cluster (kubernetes).

code

queryrequest request = queryrequest.newbuilder(sql)                 .setmaxwaittime(30000l)                 .setuselegacysql(false)                 .setusequerycache(false)                 .build();  queryresponse response = bigquery.query(request); 

thank you

just looking @ code briefly here: https://github.com/googlecloudplatform/google-cloud-java/blob/master/google-cloud-bigquery/src/main/java/com/google/cloud/bigquery/bigqueryimpl.java

it appears there multiple potential sources of delay:

  • getting query results
  • restarting (there automatic restarts in there can explain delay spikes)
  • the frequency of checking new results

it sounds looking @ wireshark give precise answer of happening.


Comments

Popular posts from this blog

php - Permission denied. Laravel linux server -

python - Pandas two dataframes multiplication? -