Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Bigtable v2 API #1850

Closed
tseaver opened this issue Jun 10, 2016 · 28 comments
Closed

Implement Bigtable v2 API #1850

tseaver opened this issue Jun 10, 2016 · 28 comments
Assignees
Labels
api: bigtable Issues related to the Bigtable API.

Comments

@tseaver
Copy link
Contributor

tseaver commented Jun 10, 2016

@dhermes if you can do a braindump here of what you've learned, and then assign to me, that would be great.

@tseaver tseaver added the api: bigtable Issues related to the Bigtable API. label Jun 10, 2016
@dhermes
Copy link
Contributor

dhermes commented Jun 10, 2016

  • Only the data API is in v2, the table and cluster admins are still v1
  • The majority of the changes are in ReadRowsResponse.
  • google/bigtable/v2/data.proto is essentially equivalent to google/bigtable/v1/bigtable_data.proto
  • google/bigtable/v2/bigtable.proto is essentially equivalent to google/bigtable/v1/bigtable_service.proto and google/bigtable/v1/bigtable_service_messages.proto combined
  • Some of the inclusive / exclusive fields have changed, but that's somewhat easy to track
  • There is a new method in the v1 Table Admin API that we should implement

@garye
Copy link
Contributor

garye commented Jun 21, 2016

Some additional information:

@sduskis
Copy link
Contributor

sduskis commented Jun 21, 2016

I'd guess that MutateRows, bulk delete, and "bulk read" (a set of random row keys) are not in the current python implementation, since they were not there from day one and we added them along the way. They are useful, but not required functionality.

@sduskis
Copy link
Contributor

sduskis commented Jun 21, 2016

We have a java implementation which processes v2 ReadRowsResponse objects. Hopefully, it can help give some insight into how to use the new API. There's also a json file we use to drive automated tests; we can explain more about it offline.

@garye
Copy link
Contributor

garye commented Jun 21, 2016

If we implement happybase's Table.batch then MutateRows would be helpful to leverage, but I agree with Solomon that it's not required.

@sduskis
Copy link
Contributor

sduskis commented Jun 21, 2016

Yeah, the new MutateRows is pretty awesome. It has a significant performance improvement over MutateRow.

Anything related to v2 comes first due to time sensitivity. We have a few performance improvement (MutateRows, multiple grpc channels) and reliability (retries) improvements that aren't as time sensitive.

@garye
Copy link
Contributor

garye commented Jun 21, 2016

Here's a link to the go read rows implementation.
https://gist.github.com/garye/4b2ce32f8b6245e1024c7ce0a37b681d

@tseaver
Copy link
Contributor Author

tseaver commented Jun 23, 2016

I've set up a feature branch for this project: I intend to merge PRs to that branch, and then merge that branch to master.

@tseaver
Copy link
Contributor Author

tseaver commented Jun 23, 2016

Reviewing open gcloud-python issues related to Bigtable:

@sduskis
Copy link
Contributor

sduskis commented Jun 23, 2016

RenameTable was removed in v2. While it existed in v1, it was never actually implemented. Good catch.

@lesv
Copy link

lesv commented Jun 23, 2016

The protos should be at #1895

@garye
Copy link
Contributor

garye commented Jun 23, 2016

#1895 is bogus because it's targeted at master and breaks everything anyways, but at least look at the Makefile change. I fixed the makefile to find grpc_python_plugin wherever it is in your path but there's also a python script that runs and assumes a location, so I ended up copying grpc_python_plugin to the gcloud-python dir before running make generate.

@tseaver
Copy link
Contributor Author

tseaver commented Jun 23, 2016

Setup:

Port modules to V2 protos:

System testing

FInalize

Open issues:

@lesv
Copy link

lesv commented Jun 23, 2016

@tswast FYI

@tswast
Copy link
Contributor

tswast commented Jun 23, 2016

Question for the eng team CCed on this thread. I see there are some more streaming APIs. Would it be possible to implement these APIs in the client as iterators (and stream the underlying results)?

#1812

Right now for PartialRowsData, the client accumulates the results in a semi-hidden list.

@tseaver
Copy link
Contributor Author

tseaver commented Jun 24, 2016

@sduskis

There's also a json file we use to drive automated tests; we can explain more about it offline.

I'm trying to parse the chunks entries in that file via ReadRowsResponse.CellChunk.FromString, but that raises an exception: am I holding it wrong?

@garye
Copy link
Contributor

garye commented Jun 24, 2016

Sounds like it should work - what was the exception?

On Fri, Jun 24, 2016, 6:14 PM Tres Seaver notifications@github.com wrote:

@sduskis https://github.com/sduskis

There's also a json file we use to drive automated tests; we can explain
more about it offline.

I'm trying to parse the chunks entries in that file via
ReadRowsResponse.CellChunk.FromString, but that raises an exception: am I
holding it wrong?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1850 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AACTr5Fyy2LtFgy9gl2KbQnhynLIukRxks5qPFbJgaJpZM4IzHp2
.

@tseaver
Copy link
Contributor Author

tseaver commented Jun 24, 2016

This session is based on the new generation-from-protos stuff in PR #1903.

>>> import json
>>> from gcloud.bigtable._generated_v2.bigtable_pb2 import ReadRowsResponse
>>> with open('gcloud/bigtable/read-rows-acceptance-test.json') as f:
...     test_json = json.load(f)
... 
>>> chunk_pb = test_json['tests'][0]['chunks'][0]
>>> ReadRowsResponse.CellChunk.FromString(chunk_pb)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/tseaver/projects/agendaless/Google/src/gcloud-python/.tox/py27/lib/python2.7/site-packages/google/protobuf/internal/python_message.py", line 780, in FromString
    message.MergeFromString(s)
  File "/home/tseaver/projects/agendaless/Google/src/gcloud-python/.tox/py27/lib/python2.7/site-packages/google/protobuf/internal/python_message.py", line 1080, in MergeFromString
    if self._InternalParse(serialized, 0, length) != length:
  File "/home/tseaver/projects/agendaless/Google/src/gcloud-python/.tox/py27/lib/python2.7/site-packages/google/protobuf/internal/python_message.py", line 1106, in InternalParse
    new_pos = local_SkipField(buffer, new_pos, end, tag_bytes)
  File "/home/tseaver/projects/agendaless/Google/src/gcloud-python/.tox/py27/lib/python2.7/site-packages/google/protobuf/internal/decoder.py", line 850, in SkipField
    return WIRETYPE_TO_SKIPPER[wire_type](buffer, pos, end)
  File "/home/tseaver/projects/agendaless/Google/src/gcloud-python/.tox/py27/lib/python2.7/site-packages/google/protobuf/internal/decoder.py", line 799, in _SkipGroup
    new_pos = SkipField(buffer, pos, end, tag_bytes)
  File "/home/tseaver/projects/agendaless/Google/src/gcloud-python/.tox/py27/lib/python2.7/site-packages/google/protobuf/internal/decoder.py", line 850, in SkipField
    return WIRETYPE_TO_SKIPPER[wire_type](buffer, pos, end)
  File "/home/tseaver/projects/agendaless/Google/src/gcloud-python/.tox/py27/lib/python2.7/site-packages/google/protobuf/internal/decoder.py", line 820, in _RaiseInvalidWireType
    raise _DecodeError('Tag had invalid wire type.')
google.protobuf.message.DecodeError: Tag had invalid wire type.

@tseaver
Copy link
Contributor Author

tseaver commented Jun 24, 2016

FWIW chunk_pb looks like a normal protobuf to me:

>>> chunk_pb
u'row_key: "RK"\nfamily_name: <\n  value: "A"\n>\nqualifier: <\n  value: "C"\n>\ntimestamp_micros: 100\nvalue: "value-VAL"\ncommit_row: false\n'

@garye
Copy link
Contributor

garye commented Jun 24, 2016

I don't know the python protobuf APIs, but it looks like there are other
things like MergeFromString and
https://developers.google.com/protocol-buffers/docs/reference/python/google.protobuf.text_format-module
that
might be an alternative?

On Fri, Jun 24, 2016 at 6:47 PM Tres Seaver notifications@github.com
wrote:

FWIW chunk_pb looks like a normal protobuf to me:

chunk_pbu'row_key: "RK"\nfamily_name: <\n value: "A"\n>\nqualifier: <\n value: "C"\n>\ntimestamp_micros: 100\nvalue: "value-VAL"\ncommit_row: false\n'


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1850 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AACTrxVOI57u76RL9I8H0HuvO76ETezmks5qPF6JgaJpZM4IzHp2
.

@tseaver
Copy link
Contributor Author

tseaver commented Jun 24, 2016

I missed that those chunks are rendered using text_format. The following works to parse one:

>>> from google.protobuf.text_format import Merge
>>> Merge(chunk_pb, chunk)

@garye
Copy link
Contributor

garye commented Jun 25, 2016

Ah, sorry we should have mentioned. Glad it's working.

On Fri, Jun 24, 2016, 7:42 PM Tres Seaver notifications@github.com wrote:

I missed that those chunks are rendered using text_format. The following
works to parse one:

from google.protobuf.text_format import Merge>>> Merge(chunk-pb, chunk)


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1850 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AACTr_aJj9nYMKbaxSozdpfXCoqTXtZnks5qPGtdgaJpZM4IzHp2
.

@tseaver
Copy link
Contributor Author

tseaver commented Jun 27, 2016

@garye Am I meant to neglect the google.bigtable.v2.instance.Instance.cluster and related messages?

@garye
Copy link
Contributor

garye commented Jun 27, 2016

@tseaver You mean for cluster administration?

@tseaver
Copy link
Contributor Author

tseaver commented Jun 27, 2016

Yes -- am I supposed to be exposing the methods to manipulate clusters within an instance?

@sduskis
Copy link
Contributor

sduskis commented Jun 27, 2016

We don't do that in go or Java. I would think Happybase doesn't have a way to manage clusters in v1 or instances in v2, since HBase doesn't have any similar concepts. If there isn't currently a way to do cluster management, we should not add new functionality.

@tseaver
Copy link
Contributor Author

tseaver commented Jun 27, 2016

@sduskis I was just asking because gcloud.python exposed cluster managment APIs for V1: if they aren't relevant for V2, cool (they are defined right next to the V2 instance management APIs, which is which is what made me ask).

@sduskis
Copy link
Contributor

sduskis commented Jun 27, 2016

Thanks for clarifying. I'd guess we need to expose operations that are similar to the way the GCP console operates. There is a ClusterService for some operations like resizing a cluster or changing the display name. I don't know enough about that. I'll ping the group to find out more about this.

The bottom line is that we need parity with the existing functionality in gcloud.python through the new APIs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigtable Issues related to the Bigtable API.
Projects
None yet
Development

No branches or pull requests

6 participants