-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bigtable python raises InvalidChunk: possible loss of microsecond timestamp precision? #2397
Comments
I tried to make this a little more minimal, but can't get this to break: import datetime
from gcloud import bigtable
from gcloud.bigtable import row_filters
instance_id = 'bigtabletesting'
table_id = 'Hello-Bigtable'
client = bigtable.Client(admin=True)
instance = client.instance(instance_id, 'us-central1-c')
table = instance.table(table_id)
column_family_id = 'cf1'
cf1 = table.column_family(column_family_id)
table.create(column_families=[cf1])
timestamp = datetime.datetime(1970, 1, 1, microsecond=200)
row = table.row(b'test')
row.set_cell(
column_family_id,
b'greeting', b'Hello World!',
timestamp=timestamp)
row.commit()
col_filter = row_filters.ColumnQualifierRegexFilter(column_id)
family_filter = row_filters.FamilyNameRegexFilter(column_family_id)
row_filter = row_filters.RowFilterUnion(
filters=[col_filter, family_filter])
# BEGIN: Thing that breaks
partial_data = table.read_row(row._row_key, filter_=row_filter)
# END: Thing that breaks Now will run the entire script you linked. |
OK, I was able to reproduce. Digging in now to see why this occurs. |
The culprit is a chunk with no timestamp. This is the offending response: from google.cloud.bigtable._generated import bigtable_pb2
chunk0 = bigtable_pb2.ReadRowsResponse.CellChunk(
row_key='greeting0',
timestamp_micros=1474593939415000,
value='Hello World!',
)
chunk0.family_name.value = 'cf1'
chunk0.qualifier.value = 'greeting'
chunk1 = bigtable_pb2.ReadRowsResponse.CellChunk(
timestamp_micros=1474593939415000,
value='Hello World!',
)
chunk2 = bigtable_pb2.ReadRowsResponse.CellChunk(
value='Hello World!',
)
chunk3 = bigtable_pb2.ReadRowsResponse.CellChunk(
value='Hello World!',
commit_row=True,
)
response = bigtable_pb2.ReadRowsResponse(
chunks=[chunk0, chunk1, chunk2, chunk3]) |
There's definitely a loss of precision problem here. The timestamp I set is not the timestamp I get back on read.
import datetime
import pytz
from gcloud import bigtable
instance_id = 'bigtabletesting'
table_id = 'Hello-Bigtable'
client = bigtable.Client(admin=True)
instance = client.instance(instance_id, 'us-central1-c')
table = instance.table(table_id)
client.start()
column_family_id = 'cf1'
cf1 = table.column_family(column_family_id)
table.create(column_families=[cf1])
timestamp = datetime.datetime(1970, 1, 1, microsecond=200, tzinfo=pytz.utc)
row = table.row(b'test')
row.set_cell(
column_family_id,
b'greeting', b'Hello World!',
timestamp=timestamp)
row.commit()
partial_data = table.read_row(row._row_key)
assert(len(partial_data.cells["cf1"]["greeting"]) == 1)
# These should be equal, but they are not because microseconds are lost.
print "Read: %s" % partial_data.cells["cf1"]["greeting"][0].timestamp.isoformat()
print "Set: %s" % timestamp.isoformat() |
I wondered if this was just a problem around epoch, but it's not. 1 Dec is the same as 1 Jan.
Bigtable does support microsecond precision, right? |
The cloud bigtable API does not support microseconds at this point. |
Yep, I eventually realized, see #2569 There's still a problem with timestamps here that's independent. |
Also filed feature request for microsecond granularity here: |
Just FYI I upgraded to google-cloud-bigtable==0.22.0 and verified the error is still there. I have a few integration tests I have to skip at the moment due to this bug. |
Hello, As part of trying to get things under control (as well as to empower us to provide better customer service in the future), I am declaring a "bankruptcy" of sorts on many of the old issues, especially those likely to have been addressed or made obsolete by more recent updates. My goal is to close stale issues whose relevance or solution is no longer immediately evident, and which appear to be of lower importance. I believe in good faith that this is one of those issues, but I am scanning quickly and may occasionally be wrong. If this is an issue of high importance, please comment here and we will reconsider. If this is an issue whose solution is trivial, please consider providing a pull request. Thank you! |
The smallest reproducible test case I could get to is here (small diff off hello world):
destijl/python-docs-samples@cc074c1
I write a new value to the same row with an older timestamp that is 200us after epoch, and now I can't read the row anymore. My current best guess is that we end up storing a 0 timestamp because microseconds are lost somewhere and there is code that treats
not chunk.timestamp_micros
as an error as you can see in the backtrace.The only reason I came across this is I'm implementing a bigtable datastore and our unit tests do weird stuff like this to test timestamp handling of the various databases.
backtrace:
The text was updated successfully, but these errors were encountered: