Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

google.cloud.bigquery.table.Table.insert_data fail silently after deleting and recreating table #3822

Closed
inglesp opened this issue Aug 16, 2017 · 13 comments
Assignees
Labels
api: bigquery Issues related to the BigQuery API. priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.

Comments

@inglesp
Copy link

inglesp commented Aug 16, 2017

Summary

For about 12 seconds after deleting and recreating a table, table.fetch_data() returns nothing, even though table.insert_data() has been called without error. Between about 12 and 60 seconds, calling table.insert_data() raises ServiceUnavailable. After that, table.insert_data() succeeds and the inserted data can be retrieved.

Details

Here's some code that demonstrates the problem:

client = bigquery.Client(project=project_name)
dataset = client.dataset(dataset_name)
dataset.create()

schema = [
    bigquery.SchemaField('a', 'INTEGER'),
]

table = dataset.table(table_name, schema)
table.create()

table.insert_data([[1]])

rows = list(table.fetch_data())
print('Got %d rows' % len(rows))

table.delete()

table = dataset.table(table_name, schema)
table.create()

t0 = time.time()

for i in itertools.count():
    time.sleep(1)
    print(int(time.time() - t0))
    try:
        table.insert_data([[i]])
    except ServiceUnavailable:
        print('ServiceUnavailable')
        continue

    rows = list(table.fetch_data())
    if rows:
        break

print('Got data after %d seconds' % (time.time() - t0))
print('Got %d rows' % len(rows))
print(rows)

It produces output similar to the following:

Got 1 rows
1
3
4
6
7
9
11
12
ServiceUnavailable
13
ServiceUnavailable
```
56
ServiceUnavailable
58
ServiceUnavailable
59
Got data after 61 seconds
Got 1 rows
[(51,)]
```
@inglesp inglesp changed the title google.cloud.bigquery.table.Table.insert_data can fail silently google.cloud.bigquery.table.Table.insert_data fail silently after deleting and recreating table Aug 16, 2017
@inglesp
Copy link
Author

inglesp commented Aug 16, 2017

I've updated the issue description. Originally I'd only observed this after deleting and recreating a dataset and a table; I can now reproduce it after deleting and recreating a table.

@lukesneeringer
Copy link
Contributor

@inglesp,
Thanks for reporting.

Help me understand better. Is this an eventual consistency problem (e.g. it works a few minutes later)? Or does it never work for some reason?

@lukesneeringer lukesneeringer added priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. labels Aug 16, 2017
@inglesp
Copy link
Author

inglesp commented Aug 16, 2017

By my understanding of eventual consistency, this is not an eventual consistency problem, in that writes to a table appear to succeed but get lost.

I think I can come up with a better code example for the notes, so I'll update the description shortly.

@lukesneeringer
Copy link
Contributor

@inglesp Go ahead and make a new comment; will be easier for me to grok. Thanks. :-)

@inglesp
Copy link
Author

inglesp commented Aug 16, 2017

I updated the description an hour ago with updated script that should demonstrate the problem.

@dhermes dhermes added the api: bigquery Issues related to the BigQuery API. label Aug 16, 2017
@WillianFuks
Copy link
Contributor

I wonder if this is related to this SO thread. As Hoffa pointed out, this is an expected behavior of BigQuery so I imagine the python API might not have much to do here, other then throw a ServiceUnavailable error message.

@tswast
Copy link
Contributor

tswast commented Dec 12, 2017

Yeah, this is a known backend issue. Basically, it can take 2 minutes for the streaming buffer's cache of where tables are located to update, so the writes after a delete and recreate can end up going to the deleted table's files in that 2 minute window.

Related: #4553 For adding data to brand new table, it's recommended to use a load job instead. We can make doing this easier with the client library.

@tswast tswast closed this as completed Dec 12, 2017
@szxyks
Copy link

szxyks commented Sep 6, 2018

Has this been fixed?

@tswast
Copy link
Contributor

tswast commented Sep 6, 2018

This is still a known issue with the BigQuery API: https://issuetracker.google.com/issues/64329577#comment3

Note: for many use cases you can recreate the table with the new rows by using a load job and the WRITE_TRUNCATE write disposition.

@Luttik
Copy link

Luttik commented May 27, 2020

@tswast could you explain how to do that? WRITE_TRUNCATE is not present at all in the python documentation for bigquery.

@tswast
Copy link
Contributor

tswast commented Jun 2, 2020

CC @shollyman @plamut

@plamut
Copy link
Contributor

plamut commented Jun 6, 2020

@Luttik The WriteDisposition class is documented here, it's listed under the Job-related types section.

That documentation is more of an API reference, though, but more samples can be found in the how-to guides in the BigQuery product documentation. For example, for the JSON format, there is a Python sample that explains the effect of the WRITE_TRUNCATE option.

Does that help?

@Luttik
Copy link

Luttik commented Jul 21, 2020

@plamut thanks for your kind reaction, even though I was wrong about the documentation (I guess I had a fluke with the quick search). I changed to a Postgress DB so I do not know it this would have worked. But I'll definitely keep this in mind.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the BigQuery API. priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Projects
None yet
Development

No branches or pull requests

8 participants