-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BigQuery: Streaming insert return sucessful but not insert data after table update #7433
Comments
@tswast Does this ring any bells for you? |
@bgbraga Does this issue still occur if you wait 2 minutes? I believe the buffer issue might affect table updates as well as recreating a table, especially if the table update adds a new column to the schema. If this is the case, I'd expect the error in the first call to /cc @shollyman who is working on additional streaming docs. |
If you specify a manual row ID, BigQuery will de-duplicate the writes. |
@tswast sorry, I have to test what occur if I wait for 2 minutes. Two minutes could be the time to process/update cache area. For me it is very strange. If there is a way to force the cache update, it should be done before the insert call to fix that problem without missing any data. |
What does the response of the first call to |
@tswast you are right about first insert_rows call. That is the error: But the second insert_rows call that is called in sequence, always works.
Apparently the first call is forcing the cache area update or something like that. So I made two extra tests:
Result: false. The firt command continue failing. 2) add a sleep time (2 minutes) before first insert_rows call.
Result: false. The firt command continue failing. Even if I wait 2 minutes to call first insert_rows. Conclusion: Do you have any guess about that? In my tests I'm calling Google remotely from my machine and my code is using these google API versions: My files (data1.json is the original file, data2.json is the new file with phone field and schema.txt is my new schema that I used to update table) |
Are there any updates on this issue ? |
I have same problem using golang libray ;/ |
I have this problem with python apache beam when trying to update table schema as well. Note that this doesn't happen with JavaSDK. |
This is a known issue with the backend, due to the same problem as deleting and recreating a table. https://issuetracker.google.com/64329577#comment3 A workaround is to update the table by using a load job with the "ALLOW_FIELD_ADDITION" value set in google-cloud-python/bigquery/docs/snippets.py Lines 1425 to 1431 in 7ba0220
|
This is closed? How is this an acceptable issue? Data gets sprayed into the void. |
@lordnynex This repo only holds issues for bugs in the client. Since this is a backend issue, it needs to go to the API team by going through support https://cloud.google.com/support/ or filing an issue here: https://issuetracker.google.com/issues/new?component=187149&template=1162659 |
I know that is an issue insert data after deleting and recreating table:
#3822
The workaround should wait for 2 minutes as we can see the comments.
Here I had the same problem to insert data but it seems not related with time.
After update a table, I always had problem insert the data using insert_rows method.
So I made a test.
It always write the payload once after table update.
So the first call never works. Only the second call.
I think it shows that this case is not related with time for buffer cache. The table issue seems solved after first insert_rows operation.
I can´t leave that workaround at the code because if I didn't update the table that code duplicate the payload at my table. Also it is strange to keep the code with that workaround.
The text was updated successfully, but these errors were encountered: