-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: implement row and cell model classes #753
Conversation
) -> list[CellResponse]: | ||
""" | ||
Returns cells sorted in Bigtable native order: | ||
- Family lexicographically ascending | ||
- Qualifier lexicographically ascending | ||
- Qualifier ascending |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had in my notes that Qualifier should be sorted lexicographicaly, but I assume this is a mistake, since qualifier is bytes?
I implemented this by directly comparing the bytes values instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, so yeah it looks like it is comparing them as unsigned byte arrays, which I believe is the same operation as Python's byte comparison. I'll have to add some tests to make sure they are consistent though
) -> list[CellResponse]: | ||
""" | ||
Returns cells sorted in Bigtable native order: | ||
- Family lexicographically ascending | ||
- Qualifier lexicographically ascending | ||
- Qualifier ascending |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
""" | ||
raise NotImplementedError | ||
Returns a list of (family, qualifier) pairs associated with the cells |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would expect keys() to return the row key of the response, maybe there's a better name for this? Or maybe I misunderstand something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, so for convenience, I made RowResponse conform to the Sequence base type, so it can be treated as a standard sequence:
len(row_response)
for cell in row_response:
,first = row_response[0]
sample = row_response[0:5]
- etc
I also wanted to make it partially compatible with the Mapping base type, to make indexing convenient. In this context, the row_response can be treated as a key/value dictionary, where the keys are families or family/qualifier pairs, and the values are the associated cells:
cells = row_response["my_family"]
cells = row_response["my-family, "my-qualifier"]
I added also implementations for Mapping's keys()
values()
and items()
functions to make iteration easier:
for family,qualifier in row_response.keys():
cells = row_response[family,qualifier]
for cells in row_response.values():
print(cells)
for (family,qualifier), cells in row_response.items():
print(cells)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So TL;DR: keys here refers to dictionary.keys(), to make iteration over stored data easier. But I can see how that may be confusing, and I'm ok with making changes here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gotcha, thanks for the explanation! maybe we can leave the implementations for Mapping's keys() values() and items() functions
part out for now? I can see how it's convenient, but maybe we can keep it simple at the first iteration?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, sounds good. I changed keys()
to get_column_components
, and removed the others
for cell in sorted(cells): | ||
if cell.row_key != self.row_key: | ||
raise ValueError( | ||
f"CellResponse row_key ({cell.row_key!r}) does not match RowResponse key ({self.row_key!r})" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this check necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our backend should never create rows that hit this, but end users could attempt to
And it doesn't hurt to include it as a sanity check either way
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as discussed, endusers should never construct these
Co-authored-by: Mattie Fu <mattiefu@google.com>
cells: list[CellResponse] | ||
| dict[tuple[family_id, qualifier], list[dict[str, Any]]], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there should only be one way to represent data in a response
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe make this internal as well
self._cells_map: dict[ | ||
family_id, dict[qualifier, list[CellResponse]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a better data structure would be: OrderedDict[family, list[Cell]) and use bsearch to find a qualifier in the cell list
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you remind me the motivation for this?
for cell in sorted(cells): | ||
if cell.row_key != self.row_key: | ||
raise ValueError( | ||
f"CellResponse row_key ({cell.row_key!r}) does not match RowResponse key ({self.row_key!r})" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as discussed, endusers should never construct these
this_ordering = ( | ||
self.family, | ||
self.column_qualifier, | ||
-self.timestamp_micros, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please dont fight native ordering
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding was that newer cells should come first? By default, older timestamps would come first.
That said, comparison is less relevant if we just use cells as they come from the backend though. If we don't have a way to determine proper ordering from the client side, maybe we should remove these comparison implementations entirely?
* feat: add new v3.0.0 API skeleton (#745) * feat: improve rows filters (#751) * feat: read rows query model class (#752) * feat: implement row and cell model classes (#753) * feat: add pooled grpc transport (#748) * feat: implement read_rows (#762) * feat: implement mutate rows (#769) * feat: literal value filter (#767) * feat: row_exists and read_row (#778) * feat: read_modify_write and check_and_mutate_row (#780) * feat: sharded read rows (#766) * feat: ping and warm with metadata (#810) * feat: mutate rows batching (#770) * chore: restructure module paths (#816) * feat: improve timeout structure (#819) * fix: api errors apply to all bulk mutations * chore: reduce public api surface (#820) * feat: improve error group tracebacks on < py11 (#825) * feat: optimize read_rows (#852) * chore: add user agent suffix (#842) * feat: optimize retries (#854) * feat: add test proxy (#836) * chore(tests): add conformance tests to CI for v3 (#870) * chore(tests): turn off fast fail for conformance tets (#882) * feat: add TABLE_DEFAULTS enum for table method arguments (#880) * fix: pass None for retry in gapic calls (#881) * feat: replace internal dictionaries with protos in gapic calls (#875) * chore: optimize gapic calls (#863) * feat: expose retryable error codes to users (#879) * chore: update api_core submodule (#897) * chore: merge main into experimental_v3 (#900) * chore: pin conformance tests to v0.0.2 (#903) * fix: bulk mutation eventual success (#909) --------- Co-authored-by: Owl Bot <gcf-owl-bot[bot]@users.noreply.github.com>
This PR implements the RowResponse and CellResponse classes, used to represent the data returned by read_rows queries
RowResponse implements Sequence, allowing it to be treated like a list of Cells (len, iteration, indexing, etc)
It also implements some properties of Mapping (allowing it to be indexed by
family
and(family, qualifier)
keys, retrieving the list of keys, values, items, etc)Note: This is merging into a new v3 branch, not main. I plan on making all PRs related to this rewrite to v3 and then merging v3 into main when development is complete