Bigtable python raises InvalidChunk: possible loss of microsecond timestamp precision? #2397

destijl · 2016-09-23T00:56:12Z

The smallest reproducible test case I could get to is here (small diff off hello world):
destijl/python-docs-samples@cc074c1

I write a new value to the same row with an older timestamp that is 200us after epoch, and now I can't read the row anymore. My current best guess is that we end up storing a 0 timestamp because microseconds are lost somewhere and there is code that treats not chunk.timestamp_micros as an error as you can see in the backtrace.

The only reason I came across this is I'm implementing a bigtable datastore and our unit tests do weird stuff like this to test timestamp handling of the various databases.

backtrace:

$ python main.py grr-test-demo bigtabletesting
Creating the Hello-Bigtable table.
Writing some greetings to the table.
Getting a single greeting by row key.
Traceback (most recent call last):
  File "main.py", line 136, in <module>
    main(args.project_id, args.instance_id, args.table)
  File "main.py", line 98, in main
    row = table.read_row(key.encode('utf-8'), filter_=row_filter)
  File "/usr/local/google/home/gcastle/VE/release/lib/python2.7/site-packages/gcloud/bigtable/table.py", line 236, in read_row
    rows_data.consume_all()
  File "/usr/local/google/home/gcastle/VE/release/lib/python2.7/site-packages/gcloud/bigtable/row_data.py", line 324, in consume_all
    self.consume_next()
  File "/usr/local/google/home/gcastle/VE/release/lib/python2.7/site-packages/gcloud/bigtable/row_data.py", line 276, in consume_next
    self._validate_chunk(chunk)
  File "/usr/local/google/home/gcastle/VE/release/lib/python2.7/site-packages/gcloud/bigtable/row_data.py", line 391, in _validate_chunk
    self._validate_chunk_row_in_progress(chunk)
  File "/usr/local/google/home/gcastle/VE/release/lib/python2.7/site-packages/gcloud/bigtable/row_data.py", line 371, in _validate_chunk_row_in_progress
    _raise_if(not chunk.timestamp_micros or not chunk.value)
  File "/usr/local/google/home/gcastle/VE/release/lib/python2.7/site-packages/gcloud/bigtable/row_data.py", line 442, in _raise_if
    raise InvalidChunk(*args)

The text was updated successfully, but these errors were encountered:

dhermes · 2016-09-23T01:21:15Z

I tried to make this a little more minimal, but can't get this to break:

import datetime

from gcloud import bigtable
from gcloud.bigtable import row_filters


instance_id = 'bigtabletesting'
table_id = 'Hello-Bigtable'

client = bigtable.Client(admin=True)
instance = client.instance(instance_id, 'us-central1-c')
table = instance.table(table_id)

column_family_id = 'cf1'
cf1 = table.column_family(column_family_id)
table.create(column_families=[cf1])

timestamp = datetime.datetime(1970, 1, 1, microsecond=200)
row = table.row(b'test')
row.set_cell(
    column_family_id,
    b'greeting', b'Hello World!',
    timestamp=timestamp)
row.commit()

col_filter = row_filters.ColumnQualifierRegexFilter(column_id)
family_filter = row_filters.FamilyNameRegexFilter(column_family_id)
row_filter = row_filters.RowFilterUnion(
    filters=[col_filter, family_filter])
# BEGIN: Thing that breaks
partial_data = table.read_row(row._row_key, filter_=row_filter)
#   END: Thing that breaks

Now will run the entire script you linked.

dhermes · 2016-09-23T01:26:22Z

OK, I was able to reproduce. Digging in now to see why this occurs.

dhermes · 2016-09-23T01:46:09Z

The culprit is a chunk with no timestamp. This is the offending response:

from google.cloud.bigtable._generated import bigtable_pb2

chunk0 = bigtable_pb2.ReadRowsResponse.CellChunk(
  row_key='greeting0',
  timestamp_micros=1474593939415000,
  value='Hello World!',
)
chunk0.family_name.value = 'cf1'
chunk0.qualifier.value = 'greeting'
chunk1 = bigtable_pb2.ReadRowsResponse.CellChunk(
  timestamp_micros=1474593939415000,
  value='Hello World!',
)
chunk2 = bigtable_pb2.ReadRowsResponse.CellChunk(
  value='Hello World!',
)
chunk3 = bigtable_pb2.ReadRowsResponse.CellChunk(
  value='Hello World!',
  commit_row=True,
)
response = bigtable_pb2.ReadRowsResponse(
    chunks=[chunk0, chunk1, chunk2, chunk3])

destijl · 2016-10-19T13:41:25Z

There's definitely a loss of precision problem here. The timestamp I set is not the timestamp I get back on read.

$ python ~/temp.py
Read: 1970-01-01T00:00:00+00:00
Set: 1970-01-01T00:00:00.000200+00:00

import datetime
import pytz

from gcloud import bigtable
instance_id = 'bigtabletesting'
table_id = 'Hello-Bigtable'

client = bigtable.Client(admin=True)
instance = client.instance(instance_id, 'us-central1-c')
table = instance.table(table_id)
client.start()

column_family_id = 'cf1'
cf1 = table.column_family(column_family_id)
table.create(column_families=[cf1])

timestamp = datetime.datetime(1970, 1, 1, microsecond=200, tzinfo=pytz.utc)
row = table.row(b'test')
row.set_cell(
        column_family_id,
            b'greeting', b'Hello World!',
                timestamp=timestamp)
row.commit()

partial_data = table.read_row(row._row_key)
assert(len(partial_data.cells["cf1"]["greeting"]) == 1)
# These should be equal, but they are not because microseconds are lost.
print "Read: %s" % partial_data.cells["cf1"]["greeting"][0].timestamp.isoformat()
print "Set: %s" % timestamp.isoformat()

destijl · 2016-10-19T13:56:05Z

I wondered if this was just a problem around epoch, but it's not. 1 Dec is the same as 1 Jan.

$ python ~/temp.py
Read: 1970-12-01T00:00:00+00:00
Set: 1970-12-01T00:00:00.000200+00:00

Bigtable does support microsecond precision, right?

sduskis · 2016-10-26T18:18:55Z

The cloud bigtable API does not support microseconds at this point.

destijl · 2016-10-26T19:42:46Z

Yep, I eventually realized, see #2569 There's still a problem with timestamps here that's independent.

destijl · 2016-10-26T19:53:20Z

Also filed feature request for microsecond granularity here:
#2626

destijl · 2017-02-13T21:04:04Z

Just FYI I upgraded to google-cloud-bigtable==0.22.0 and verified the error is still there. I have a few integration tests I have to skip at the moment due to this bug.

lukesneeringer · 2017-08-11T21:16:22Z

Hello,
One of the challenges of maintaining a large open source project is that sometimes, you can bite off more than you can chew. As the lead maintainer of google-cloud-python, I can definitely say that I have let the issues here pile up.

As part of trying to get things under control (as well as to empower us to provide better customer service in the future), I am declaring a "bankruptcy" of sorts on many of the old issues, especially those likely to have been addressed or made obsolete by more recent updates.

My goal is to close stale issues whose relevance or solution is no longer immediately evident, and which appear to be of lower importance. I believe in good faith that this is one of those issues, but I am scanning quickly and may occasionally be wrong. If this is an issue of high importance, please comment here and we will reconsider. If this is an issue whose solution is trivial, please consider providing a pull request.

Thank you!

dhermes added type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. api: bigtable Issues related to the Bigtable API. labels Sep 23, 2016

dhermes self-assigned this Sep 23, 2016

destijl mentioned this issue Oct 19, 2016

Can't set table timestamp precision with python API #2569

Closed

lukesneeringer added the priority: p2 Moderately-important priority. Fix may not be included in next release. label Apr 19, 2017

lukesneeringer closed this as completed Aug 11, 2017

theacodes unassigned dhermes Sep 28, 2018

JustinBeckwith assigned lukesneeringer Feb 1, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bigtable python raises InvalidChunk: possible loss of microsecond timestamp precision? #2397

Bigtable python raises InvalidChunk: possible loss of microsecond timestamp precision? #2397

destijl commented Sep 23, 2016

dhermes commented Sep 23, 2016

dhermes commented Sep 23, 2016

dhermes commented Sep 23, 2016

destijl commented Oct 19, 2016 •

edited

Loading

destijl commented Oct 19, 2016

sduskis commented Oct 26, 2016 •

edited

Loading

destijl commented Oct 26, 2016

destijl commented Oct 26, 2016

destijl commented Feb 13, 2017

lukesneeringer commented Aug 11, 2017

Bigtable python raises InvalidChunk: possible loss of microsecond timestamp precision? #2397

Bigtable python raises InvalidChunk: possible loss of microsecond timestamp precision? #2397

Comments

destijl commented Sep 23, 2016

dhermes commented Sep 23, 2016

dhermes commented Sep 23, 2016

dhermes commented Sep 23, 2016

destijl commented Oct 19, 2016 • edited Loading

destijl commented Oct 19, 2016

sduskis commented Oct 26, 2016 • edited Loading

destijl commented Oct 26, 2016

destijl commented Oct 26, 2016

destijl commented Feb 13, 2017

lukesneeringer commented Aug 11, 2017

destijl commented Oct 19, 2016 •

edited

Loading

sduskis commented Oct 26, 2016 •

edited

Loading