Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: implement row and cell model classes #753

Merged
merged 16 commits into from
Apr 5, 2023

Conversation

daniel-sanche
Copy link
Contributor

@daniel-sanche daniel-sanche commented Mar 16, 2023

This PR implements the RowResponse and CellResponse classes, used to represent the data returned by read_rows queries

RowResponse implements Sequence, allowing it to be treated like a list of Cells (len, iteration, indexing, etc)

It also implements some properties of Mapping (allowing it to be indexed by family and (family, qualifier) keys, retrieving the list of keys, values, items, etc)


Note: This is merging into a new v3 branch, not main. I plan on making all PRs related to this rewrite to v3 and then merging v3 into main when development is complete

@daniel-sanche daniel-sanche requested review from a team as code owners March 16, 2023 17:01
@product-auto-label product-auto-label bot added size: xl Pull request size is extra large. api: bigtable Issues related to the googleapis/python-bigtable API. labels Mar 16, 2023
) -> list[CellResponse]:
"""
Returns cells sorted in Bigtable native order:
- Family lexicographically ascending
- Qualifier lexicographically ascending
- Qualifier ascending
Copy link
Contributor Author

@daniel-sanche daniel-sanche Mar 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had in my notes that Qualifier should be sorted lexicographicaly, but I assume this is a mistake, since qualifier is bytes?
I implemented this by directly comparing the bytes values instead

Copy link
Contributor

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so yeah it looks like it is comparing them as unsigned byte arrays, which I believe is the same operation as Python's byte comparison. I'll have to add some tests to make sure they are consistent though

@daniel-sanche daniel-sanche added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Mar 23, 2023
@yoshi-kokoro yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Mar 23, 2023
) -> list[CellResponse]:
"""
Returns cells sorted in Bigtable native order:
- Family lexicographically ascending
- Qualifier lexicographically ascending
- Qualifier ascending
Copy link
Contributor

google/cloud/bigtable/row_response.py Outdated Show resolved Hide resolved
"""
raise NotImplementedError
Returns a list of (family, qualifier) pairs associated with the cells
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect keys() to return the row key of the response, maybe there's a better name for this? Or maybe I misunderstand something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, so for convenience, I made RowResponse conform to the Sequence base type, so it can be treated as a standard sequence:

  • len(row_response)
  • for cell in row_response:,
  • first = row_response[0]
  • sample = row_response[0:5]
  • etc

I also wanted to make it partially compatible with the Mapping base type, to make indexing convenient. In this context, the row_response can be treated as a key/value dictionary, where the keys are families or family/qualifier pairs, and the values are the associated cells:

  • cells = row_response["my_family"]
  • cells = row_response["my-family, "my-qualifier"]

I added also implementations for Mapping's keys() values() and items() functions to make iteration easier:

for family,qualifier in row_response.keys():
   cells = row_response[family,qualifier]

for cells in row_response.values():
   print(cells)

for (family,qualifier), cells in row_response.items():
   print(cells)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So TL;DR: keys here refers to dictionary.keys(), to make iteration over stored data easier. But I can see how that may be confusing, and I'm ok with making changes here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gotcha, thanks for the explanation! maybe we can leave the implementations for Mapping's keys() values() and items() functions part out for now? I can see how it's convenient, but maybe we can keep it simple at the first iteration?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, sounds good. I changed keys() to get_column_components, and removed the others

google/cloud/bigtable/row_response.py Outdated Show resolved Hide resolved
for cell in sorted(cells):
if cell.row_key != self.row_key:
raise ValueError(
f"CellResponse row_key ({cell.row_key!r}) does not match RowResponse key ({self.row_key!r})"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this check necessary?

Copy link
Contributor Author

@daniel-sanche daniel-sanche Mar 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our backend should never create rows that hit this, but end users could attempt to

And it doesn't hurt to include it as a sanity check either way

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as discussed, endusers should never construct these

Co-authored-by: Mattie Fu <mattiefu@google.com>
google/cloud/bigtable/row_response.py Outdated Show resolved Hide resolved
Comment on lines 43 to 44
cells: list[CellResponse]
| dict[tuple[family_id, qualifier], list[dict[str, Any]]],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there should only be one way to represent data in a response

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe make this internal as well

Comment on lines 53 to 54
self._cells_map: dict[
family_id, dict[qualifier, list[CellResponse]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a better data structure would be: OrderedDict[family, list[Cell]) and use bsearch to find a qualifier in the cell list

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remind me the motivation for this?

for cell in sorted(cells):
if cell.row_key != self.row_key:
raise ValueError(
f"CellResponse row_key ({cell.row_key!r}) does not match RowResponse key ({self.row_key!r})"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as discussed, endusers should never construct these

this_ordering = (
self.family,
self.column_qualifier,
-self.timestamp_micros,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please dont fight native ordering

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding was that newer cells should come first? By default, older timestamps would come first.

That said, comparison is less relevant if we just use cells as they come from the backend though. If we don't have a way to determine proper ordering from the client side, maybe we should remove these comparison implementations entirely?

@daniel-sanche daniel-sanche merged commit c55099f into googleapis:v3 Apr 5, 2023
daniel-sanche added a commit that referenced this pull request Feb 5, 2024
* feat: add new v3.0.0 API skeleton (#745)

* feat: improve rows filters (#751)

* feat: read rows query model class (#752)

* feat: implement row and cell model classes (#753)

* feat: add pooled grpc transport (#748)

* feat: implement read_rows (#762)

* feat: implement mutate rows (#769)

* feat: literal value filter (#767)

* feat: row_exists and read_row (#778)

* feat: read_modify_write and check_and_mutate_row (#780)

* feat: sharded read rows (#766)

* feat: ping and warm with metadata (#810)

* feat: mutate rows batching (#770)

* chore: restructure module paths (#816)

* feat: improve timeout structure (#819)

* fix: api errors apply to all bulk mutations

* chore: reduce public api surface (#820)

* feat: improve error group tracebacks on < py11 (#825)

* feat: optimize read_rows (#852)

* chore: add user agent suffix (#842)

* feat: optimize retries (#854)

* feat: add test proxy (#836)

* chore(tests): add conformance tests to CI for v3 (#870)

* chore(tests): turn off fast fail for conformance tets (#882)

* feat: add TABLE_DEFAULTS enum for table method arguments (#880)

* fix: pass None for retry in gapic calls (#881)

* feat: replace internal dictionaries with protos in gapic calls (#875)

* chore: optimize gapic calls (#863)

* feat: expose retryable error codes to users (#879)

* chore: update api_core submodule (#897)

* chore: merge main into experimental_v3 (#900)

* chore: pin conformance tests to v0.0.2 (#903)

* fix: bulk mutation eventual success (#909)

---------

Co-authored-by: Owl Bot <gcf-owl-bot[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigtable Issues related to the googleapis/python-bigtable API. size: xl Pull request size is extra large.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants