Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Allow row_range to be treated as a clause #864

Merged
merged 25 commits into from
Nov 3, 2023

Conversation

jjerphan
Copy link
Collaborator

Reference Issues/PRs

Fixes #747.

What does this implement/fix? How does it work (high level)? Highlight notable design decisions.

Any other comments?

Checklist

Checklist for code changes...
  • Have you updated the relevant docstrings and documentation?
  • Is this contribution tested against all ArcticDB's features?
  • Do all exceptions introduced raise appropriate error messages?
  • Are API changes highlighted in the PR description?
  • Is the PR labelled as enhancement or bug so it appears in autogenerated release notes?

@jjerphan jjerphan force-pushed the feat/read-row_range-clause branch 2 times, most recently from 40aedec to b83701b Compare September 15, 2023 08:15
Signed-off-by: Julien Jerphanion <git@jjerphan.xyz>
Signed-off-by: Julien Jerphanion <git@jjerphan.xyz>
Signed-off-by: Julien Jerphanion <git@jjerphan.xyz>
Signed-off-by: Julien Jerphanion <git@jjerphan.xyz>
This is equivalent but this has the benefit of allowing
efining defaults value in Python 3.6, which aren't
available via the `namedtuple` factory
(only have been added in Python 3.7).

See: https://docs.python.org/3/library/typing.html#typing.NamedTuple
See: https://docs.python.org/3/library/collections.html#collections.namedtuple

Signed-off-by: Julien Jerphanion <git@jjerphan.xyz>
Signed-off-by: Julien Jerphanion <git@jjerphan.xyz>
Signed-off-by: Julien Jerphanion <git@jjerphan.xyz>
Signed-off-by: Julien Jerphanion <git@jjerphan.xyz>
Signed-off-by: Julien Jerphanion <git@jjerphan.xyz>
Signed-off-by: Julien Jerphanion <git@jjerphan.xyz>
Signed-off-by: Julien Jerphanion <git@jjerphan.xyz>
This is suboptimal, but for now slicing on columns
makes the composition with other clauses (like
FilterClause) impossible.

Signed-off-by: Julien Jerphanion <git@jjerphan.xyz>
Signed-off-by: Julien Jerphanion <git@jjerphan.xyz>
Signed-off-by: Julien Jerphanion <git@jjerphan.xyz>
Signed-off-by: Julien Jerphanion <git@jjerphan.xyz>
cpp/arcticdb/processing/clause.cpp Show resolved Hide resolved
cpp/arcticdb/processing/clause.cpp Outdated Show resolved Hide resolved
Comment on lines +272 to +276
class PythonRowRangeClause(NamedTuple):
row_range_type: _RowRangeType = None
n: int = None
start: int = None
end: int = None
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to reviewers: using the NamedTuple class allows specifying default values to use either:

PythonRowRangeClause(row_range_type=_RowRangeType.HEAD, n=n))
PythonRowRangeClause(row_range_type=_RowRangeType.TAIL, n=n))
PythonRowRangeClause(start=start, end=end))

@@ -1498,7 +1520,8 @@ def _get_read_query(self, date_range: Optional[DateRangeInput], row_range, colum
read_query.row_filter = _normalize_dt_range(date_range)

if row_range is not None:
read_query.row_range = row_range
total_n_rows = self.get_num_rows(symbol=symbol, as_of=as_of, **kwargs)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to reviewers: get_description is an alternative, but this is unreachable in this scope.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason the logic in _normalize_row_range was in the C++ layer was because methods like get_num_rows involve reading the index key, so it will be read twice with this implementation. Can we please refactor back to the state where it is only read once?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I did this change was to get ride of calculate_row_filter and SignedRowRange in cpp/arcticdb/pipeline/query.hpp.

I first tried to have this logic in the C++ side but I do not know what is the best way to get the total number of lines as soon as possible unfortunately.

Copy link
Collaborator Author

@jjerphan jjerphan Oct 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it will be read twice with this implementation

How costly is this? Is there a way to get the total number of rows without this cost?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted back with e2b97e9.

@jjerphan jjerphan marked this pull request as ready for review October 5, 2023 08:35
Signed-off-by: Julien Jerphanion <git@jjerphan.xyz>
Signed-off-by: Julien Jerphanion <git@jjerphan.xyz>
python/tests/unit/arcticdb/version_store/test_row_range.py Outdated Show resolved Hide resolved
python/arcticdb/version_store/_store.py Outdated Show resolved Hide resolved
python/arcticdb/version_store/_store.py Outdated Show resolved Hide resolved
python/arcticdb/version_store/processing.py Show resolved Hide resolved
cpp/arcticdb/processing/clause.hpp Outdated Show resolved Hide resolved
cpp/arcticdb/processing/clause.cpp Show resolved Hide resolved
cpp/arcticdb/processing/clause.cpp Outdated Show resolved Hide resolved
cpp/arcticdb/processing/clause.cpp Outdated Show resolved Hide resolved
jjerphan and others added 2 commits October 19, 2023 16:46
Signed-off-by: Julien Jerphanion <git@jjerphan.xyz>

Co-authored-by: Alex Owens <alex.owens@man.com>
Signed-off-by: Julien Jerphanion <git@jjerphan.xyz>
Signed-off-by: Julien Jerphanion <git@jjerphan.xyz>
@jjerphan jjerphan added the enhancement New feature or request label Nov 3, 2023
@jjerphan jjerphan merged commit bd078ff into master Nov 3, 2023
98 checks passed
@jjerphan jjerphan deleted the feat/read-row_range-clause branch November 3, 2023 14:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow row_range argument to read to be treated as a clause in the same way as date_range
2 participants