Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bugfix 1683: Add validate_index arg to staged data finalization in both V1 and V2 APIs #1694

Merged

Conversation

alexowens90
Copy link
Collaborator

@alexowens90 alexowens90 commented Jul 16, 2024

Reference Issues/PRs

Fixes #1683

What does this implement or fix?

V2 API finalize_staged_data continues to validate that index values are sorted and non-overlapping by default, but now checking can be disabled by passing validate_index=False in line with other writing methods.
V1 API compact_incomplete now allows unsorted timeseries index data by default, and checking can be enabled by passing validate_index=True in line with other writing methods.

Also means that when writing incomplete segments, the validate_index argument is used to check sortedness or not, previously it was assumed that incompletes had to be ASCENDING.

@poodlewars poodlewars added the bug Something isn't working label Jul 23, 2024
Copy link
Collaborator

@poodlewars poodlewars left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merge conflicts

prune_previous_versions: bool, default=False
Removes previous (non-snapshotted) versions from the database.
metadata : Any, default=None
Optional metadata to persist along with the symbol.
validate_index: bool, default=True
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to describe why anyone would want to set this to False. Is it just for a performance boost?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it is for if you want to allow unordered data. It is following the same pattern as the docstrings for write etc, but the phrasing has to be a bit more complex because you're creating the new version from incompletes.

lib.write(sym, df_1, staged=True)
if validate_index is None:
# Test default behaviour when arg isn't provided
with pytest.raises(SortingException):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also need tests for cases where the validation should pass

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, I'll add it to test_parralel.py though since the v2 API just passes through anyway

cpp/arcticdb/version/version_core.hpp Show resolved Hide resolved
// This is correct because incomplete segments aren't column sliced
sorting::check<ErrorCode::E_UNSORTED_DATA>(
inserted,
"Cannot finalize staged data as incomplete segments overlap one another");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be possible to make the error message describe the overlap?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added, and also made all index timestamps human readable

@alexowens90
Copy link
Collaborator Author

Merge conflicts

Rebased

@alexowens90 alexowens90 force-pushed the bugfix/1637/support-validate-index-arg-with-staged-data branch from 4b2cdc3 to f1b35ed Compare July 23, 2024 11:00
@alexowens90 alexowens90 merged commit 34f12d9 into master Jul 23, 2024
114 checks passed
@alexowens90 alexowens90 deleted the bugfix/1637/support-validate-index-arg-with-staged-data branch July 23, 2024 14:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add validate_index argument to compact_incomplete and finalize_staged_data
2 participants