Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rows go missing on streaming insert into BigQuery in Pubsub consumer #876

Closed
mattdeav opened this issue Jan 29, 2018 · 9 comments
Closed
Assignees

Comments

@mattdeav
Copy link

I'm having an issue in a Pubsub consumer where rows written to BigQuery using Put don't return an error but don't make it to the table. Here's the gist of the test I'm doing. The code sets up everything it needs except the BigQuery dataset which you'll need to create manually and change the variable on line 55. This is what the code does:

  1. Creates Pubsub topic
  2. Creates BigQuery table (deletes if exists)
  3. Publishes numMessages number of messages to Pubsub topic
  4. Pulls from topic, writes to BigQuery (message is used as both the insertID and row), and Acks if successful. Stops pull after no messages have come in for a while
  5. Reads from BigQuery table
  6. Tie-out to count records sent and received from each service

Occasionally it will work as intended (if it does, run it a few times) but most of the time the BigQuery table is missing some (if not all) rows, and the Pubsub topic has no un-acknowledged messages after, either. The problem seems to be occurring more during the initial time when Receive starts; for example, using higher numMessages will still have a problem, particularly on the initial inserts, but lower loads lose a higher percentage of messages. Here's an example of output:

# RESULTS #
Values sent to Pubsub:  [0 1 10 11 12 13 14 15 16 17 18 19 2 20 21 22 23 24 25 26 27 28 29 3 4 5 6 7 8 9]
Values read from PS:    [0 1 10 11 12 13 14 15 16 17 18 19 2 20 21 22 23 24 25 26 27 28 29 3 4 5 6 7 8 9]
Values sent to BQ:      [0 1 10 11 12 13 14 15 16 17 18 19 2 20 21 22 23 24 25 26 27 28 29 3 4 5 6 7 8 9]
Values read from BQ:    [25]
30 values sent to PS
30 values read from PS
30 values sent to BQ
1 values read from BQ
missing 0 values from Pubsub read
missing 0 values from BQ send
missing 29 values from BQ read

I've tested using delays in between steps, limits on ReceiveSettings, and delays inside the Receive func but no dice. Let me know if there is any more information I can provide.

@BlueMonday
Copy link
Contributor

BlueMonday commented Jan 29, 2018

How much of a delay have you tried between steps?

I believe there is a delay between your streaming insert request succeeding and the rows arriving to the table's streaming buffer. Rows are only queryable once they have arrived to the streaming buffer.

There is this one sneaky sentence in the docs that alludes to this: https://cloud.google.com/bigquery/streaming-data-into-bigquery#dataavailability

Streamed data is available for real-time analysis within a few seconds of the first streaming insertion into a table.

In the past I have generally found that the few seconds they refer to can end up being 1 - 5 minutes. I am not sure if it is better or worse now a days. I would try it again but this time with a 5 minute sleep before attempting to read the rows and seeing if you retrieve all the rows. I would only recommend you try the sleep to determine if the issue is the delay and not as a permanent solution since it does not guarantee that the rows have reached the streaming buffer if Big Query decides to have a bad day.

Out of curiosity why do you need to read the rows immediately after inserting them?

@mattdeav
Copy link
Author

Out of curiosity why do you need to read the rows immediately after inserting them?

Good question, and I should have addressed that. On several occasions I've waited hours and the rows still never appeared. We aren't intending to query the data immediately, the test does it out of convenience.

@BlueMonday
Copy link
Contributor

BlueMonday commented Jan 30, 2018

On several occasions I've waited hours and the rows still never appeared.

How are you checking that the rows haven't appeared?

For example the row count in the table metadata tends to be out of date. I think in the past two or so months the row count has just been outright broken, several of my tables have had millions of rows for several weeks and still displayed a count of 0. I generally just ignore the row count at this point. The preview pane also generally has quite a large delay (can't remember how large but I think its in the hours range).

Querying the table (1-5 minutes after insertion) is generally the only sure way to determine if the table contains the data.

@mattdeav
Copy link
Author

Good point, although I'm not relying on the table metadata or preview here; I do a SELECT * or SELECT COUNT(*) to tie-out. And as mentioned before, waiting has had no effect on this that I've seen. For example, this test I ran yesterday about 16 hours ago showed only 7/10 rows in BigQuery then and the table still only shows 7 rows in a SELECT *. Here's the publisher-test output:

# RESULTS #
Values sent to BQ:      [0 1 2 3 4 5 6 7 8 9]
Values read from BQ:    [2 4 5 6 7 8 9]
10 values sent to BQ
7 values read from BQ
missing 3 values from BQ read

For completeness' sake, here's the response from each TabledataInsertAllCall showing each of the ten rows receiving a 200/success response and no error from the BigQuery v2 api:

RESP: &{200 OK 200 HTTP/2.0 2 0 map[Server:[ESF] Cache-Control:[private] X-Xss-Protection:[1; mode=block] Content-Type:[application/json; charset=UTF-8] Date:[Mon, 29 Jan 2018 22:50:07 GMT] Alt-Svc:[hq=":443"; ma=2592000; quic=51303431; quic=51303339; quic=51303338; quic=51303337; quic=51303335,quic=":443"; ma=2592000; v="41,39,38,37,35"] Vary:[Origin X-Origin Referer] X-Frame-Options:[SAMEORIGIN]] 0xc420360000 -1 [] false true map[] 0xc4204aeb00 0xc4200a6fd0} ERR:
RESP: &{200 OK 200 HTTP/2.0 2 0 map[Cache-Control:[private] X-Frame-Options:[SAMEORIGIN] Content-Type:[application/json; charset=UTF-8] Vary:[Origin X-Origin Referer] Server:[ESF] X-Xss-Protection:[1; mode=block] Alt-Svc:[hq=":443"; ma=2592000; quic=51303431; quic=51303339; quic=51303338; quic=51303337; quic=51303335,quic=":443"; ma=2592000; v="41,39,38,37,35"] Date:[Mon, 29 Jan 2018 22:50:07 GMT]] 0xc420340000 -1 [] false true map[] 0xc420174400 0xc4200a6fd0} ERR:
RESP: &{200 OK 200 HTTP/2.0 2 0 map[Vary:[Origin X-Origin Referer] Date:[Mon, 29 Jan 2018 22:50:07 GMT] Server:[ESF] X-Xss-Protection:[1; mode=block] Alt-Svc:[hq=":443"; ma=2592000; quic=51303431; quic=51303339; quic=51303338; quic=51303337; quic=51303335,quic=":443"; ma=2592000; v="41,39,38,37,35"] Content-Type:[application/json; charset=UTF-8] Cache-Control:[private] X-Frame-Options:[SAMEORIGIN]] 0xc42051a0a0 -1 [] false true map[] 0xc4204ae500 0xc4200a6fd0} ERR:
RESP: &{200 OK 200 HTTP/2.0 2 0 map[Date:[Mon, 29 Jan 2018 22:50:08 GMT] Cache-Control:[private] X-Xss-Protection:[1; mode=block] Content-Type:[application/json; charset=UTF-8] Vary:[Origin X-Origin Referer] Server:[ESF] X-Frame-Options:[SAMEORIGIN] Alt-Svc:[hq=":443"; ma=2592000; quic=51303431; quic=51303339; quic=51303338; quic=51303337; quic=51303335,quic=":443"; ma=2592000; v="41,39,38,37,35"]] 0xc4203404a0 -1 [] false true map[] 0xc42050c400 0xc4200a6fd0} ERR:
RESP: &{200 OK 200 HTTP/2.0 2 0 map[Date:[Mon, 29 Jan 2018 22:50:08 GMT] X-Frame-Options:[SAMEORIGIN] Vary:[Origin X-Origin Referer] Server:[ESF] Cache-Control:[private] X-Xss-Protection:[1; mode=block] Alt-Svc:[hq=":443"; ma=2592000; quic=51303431; quic=51303339; quic=51303338; quic=51303337; quic=51303335,quic=":443"; ma=2592000; v="41,39,38,37,35"] Content-Type:[application/json; charset=UTF-8]] 0xc420340600 -1 [] false true map[] 0xc420174900 0xc4200a6fd0} ERR:
RESP: &{200 OK 200 HTTP/2.0 2 0 map[Vary:[Origin X-Origin Referer] Server:[ESF] Cache-Control:[private] X-Xss-Protection:[1; mode=block] X-Frame-Options:[SAMEORIGIN] Alt-Svc:[hq=":443"; ma=2592000; quic=51303431; quic=51303339; quic=51303338; quic=51303337; quic=51303335,quic=":443"; ma=2592000; v="41,39,38,37,35"] Content-Type:[application/json; charset=UTF-8] Date:[Mon, 29 Jan 2018 22:50:08 GMT]] 0xc4203409e0 -1 [] false true map[] 0xc4204aed00 0xc4200a6fd0} ERR:
RESP: &{200 OK 200 HTTP/2.0 2 0 map[Server:[ESF] X-Xss-Protection:[1; mode=block] X-Frame-Options:[SAMEORIGIN] Alt-Svc:[hq=":443"; ma=2592000; quic=51303431; quic=51303339; quic=51303338; quic=51303337; quic=51303335,quic=":443"; ma=2592000; v="41,39,38,37,35"] Content-Type:[application/json; charset=UTF-8] Vary:[Origin X-Origin Referer] Date:[Mon, 29 Jan 2018 22:50:08 GMT] Cache-Control:[private]] 0xc420340da0 -1 [] false true map[] 0xc4204af200 0xc4200a6fd0} ERR:
RESP: &{200 OK 200 HTTP/2.0 2 0 map[Date:[Mon, 29 Jan 2018 22:50:08 GMT] X-Frame-Options:[SAMEORIGIN] Alt-Svc:[hq=":443"; ma=2592000; quic=51303431; quic=51303339; quic=51303338; quic=51303337; quic=51303335,quic=":443"; ma=2592000; v="41,39,38,37,35"] Content-Type:[application/json; charset=UTF-8] Vary:[Origin X-Origin Referer] Server:[ESF] Cache-Control:[private] X-Xss-Protection:[1; mode=block]] 0xc4203565e0 -1 [] false true map[] 0xc4204af700 0xc4200a6fd0} ERR:
RESP: &{200 OK 200 HTTP/2.0 2 0 map[Date:[Mon, 29 Jan 2018 22:50:08 GMT] Cache-Control:[private] X-Xss-Protection:[1; mode=block] Alt-Svc:[hq=":443"; ma=2592000; quic=51303431; quic=51303339; quic=51303338; quic=51303337; quic=51303335,quic=":443"; ma=2592000; v="41,39,38,37,35"] Content-Type:[application/json; charset=UTF-8] Vary:[Origin X-Origin Referer] Server:[ESF] X-Frame-Options:[SAMEORIGIN]] 0xc4203569a0 -1 [] false true map[] 0xc420174e00 0xc4200a6fd0} ERR:
RESP: &{200 OK 200 HTTP/2.0 2 0 map[Vary:[Origin X-Origin Referer] Cache-Control:[private] Date:[Mon, 29 Jan 2018 22:50:08 GMT] Server:[ESF] X-Xss-Protection:[1; mode=block] X-Frame-Options:[SAMEORIGIN] Alt-Svc:[hq=":443"; ma=2592000; quic=51303431; quic=51303339; quic=51303338; quic=51303337; quic=51303335,quic=":443"; ma=2592000; v="41,39,38,37,35"] Content-Type:[application/json; charset=UTF-8]] 0xc420356d80 -1 [] false true map[] 0xc420175400 0xc4200a6fd0} ERR:

And here's the Query Editor console showing the same results now, 16 hours later, in a SELECT *:

image

@jba
Copy link
Contributor

jba commented Jan 30, 2018

Can we isolate whether this is pubsub or bigquery? Can you log or otherwise verify that you're actually inserting all the rows?

@BlueMonday
Copy link
Contributor

BlueMonday commented Jan 30, 2018

I think I might know what the issue is...

I believe the issue is caused by the table name being reused. Try changing the table name in your test program and seeing if you retrieve all the results (I believe you should!).

If I recall correctly, it has to do with the consistency model used in parts of Big Query. I don't know the exact details but I believe some of the rows are going to the old table since the deletion of the old table and creation of the new table with the same name hasn't propagated to the whole system yet.

I believe the Big Query team currently recommends waiting 10 minutes before streaming to a table after you have recreated it after deleting it. Alternatively you could just use a different table name every time?

Source: https://issuetracker.google.com/issues/64329577#comment3

@mattdeav
Copy link
Author

mattdeav commented Jan 30, 2018

Yep I was just noticing that it works every time on a new table and breaks on the second run. I know it's a niche case but can the API reply with an error if you try to stream to a table like this? We don't plan on dropping and re-creating tables (this was just testing), but it sounds like a known constraint that the API should acknowledge.

@BlueMonday
Copy link
Contributor

You can try opening an issue on the Big Query issue tracker and asking: https://issuetracker.google.com/issues/new?component=187149&template=0

I wouldn't get my hopes up but you never know.

@jba
Copy link
Contributor

jba commented Jan 30, 2018

I do remember running into this problem when writing integration tests. Thanks @BlueMonday for thinking of it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants