Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sort and finalize throws when staged columns are of different but promotable types (dynamic schema) #1807

Closed
Tracked by #1679
vasil-pashov opened this issue Sep 2, 2024 · 0 comments · Fixed by #1799
Assignees
Labels
bug Something isn't working

Comments

@vasil-pashov
Copy link
Collaborator

vasil-pashov commented Sep 2, 2024

Describe the bug

If one column is of type int64 and the same column is in another segment but of type int32 sort and finalize will throw. However the stream descriptor will be of the largest type because merge descriptors uses valid_common_type.

Steps/Code to Reproduce

import pandas as pd
import numpy as np
import arcticdb as adb
from arcticdb.version_store.library import StagedDataFinalizeMethod
ac = adb.Arctic("lmdb://test")
opts = adb.LibraryOptions(dynamic_schema=True)
lib = ac.get_library("test", create_if_missing=True, library_options=opts)
symbol = "test"
lib.write("sym", pd.DataFrame({"a": np.array([1], dtype="int32")}, index=pd.DatetimeIndex([pd.Timestamp(2024, 1, 1)])), staged=True)
lib.write("sym", pd.DataFrame({"a": np.array([1], dtype="int64")}, index=pd.DatetimeIndex([pd.Timestamp(2024, 1, 2)])), staged=True)
lib.sort_and_finalize_staged_data("sym", mode=StagedDataFinalizeMethod.WRITE)

Throws:

InternalException: E_ASSERTION_FAILURE Type mismatch in set_scalar, expected 4

Note the exception is thrown after all descriptors were resolved, during the merge phase. The exception is thrown when setting a value in a column.

finalize_staged_data does not allow having staged of different (but compatible types). The difference is due to this

Expected Results

Two options:

  • Promote the "smaller" type and succeed
  • Act as finalize_staged_segments and require exactly the same types in all staged segments

OS, Python Version and ArcticDB Version

Python: 3.10.11 (tags/v3.10.11:7d4cc5a, Apr 5 2023, 00:38:17) [MSC v.1929 64 bit (AMD64)]
OS: Windows-10-10.0.22631-SP0
ArcticDB: 4.5.0

Backend storage used

No response

Additional Context

No response

@vasil-pashov vasil-pashov added the bug Something isn't working label Sep 2, 2024
@vasil-pashov vasil-pashov self-assigned this Sep 2, 2024
@vasil-pashov vasil-pashov changed the title Sort and finalize throws when staged columns are of different but promotable types Sort and finalize throws when staged columns are of different but promotable types (dynamic schema) Sep 5, 2024
vasil-pashov added a commit that referenced this issue Sep 30, 2024
<!--Example: Fixes #1234. See also #3456.-->
Fixes #1738
Fixes #1781
Fixes #1466
Fixes #1795
Fixes #1797
Fixes #1807
Fixes #1828

A notable change is that staged writes no longer validate the index is
sorted. The validation is done at the moment
compact_incompletes/finalize_staged_data/sort_and_finalize_staged_data
is called. This is because sort_and_finalize_staged_data does not
require the segments to be sorted, but the call for adding a staged
segment is the same. We should add a separate call for that.

Note also that all incomplete keys for a symbol are deleted if any of
the finalize calls fail. The other option is to leave the segments. In
that case the user will have the responsibility of calling
`delete_staged_data`.

<details>
  <summary>
   Checklist for code changes...
  </summary>

- [ ] Have you updated the relevant docstrings, documentation and
copyright notice?
- [ ] Is this contribution tested against [all ArcticDB's
features](../docs/mkdocs/docs/technical/contributing.md)?
- [ ] Do all exceptions introduced raise appropriate [error
messages](https://docs.arcticdb.io/error_messages/)?
 - [ ] Are API changes highlighted in the PR description?
- [ ] Is the PR labelled as enhancement or bug so it appears in
autogenerated release notes?
</details>

<!--
Thanks for contributing a Pull Request to ArcticDB! Please ensure you
have taken a look at:
- ArcticDB's Code of Conduct:
https://github.com/man-group/ArcticDB/blob/master/CODE_OF_CONDUCT.md
- ArcticDB's Contribution Licensing:
https://github.com/man-group/ArcticDB/blob/master/docs/mkdocs/docs/technical/contributing.md#contribution-licensing
-->

---------

Co-authored-by: Vasil Pashov <vasil.pashov@man.com>
vasil-pashov added a commit that referenced this issue Sep 30, 2024
<!--Example: Fixes #1234. See also #3456.-->
Fixes #1738
Fixes #1781
Fixes #1466
Fixes #1795
Fixes #1797
Fixes #1807
Fixes #1828

A notable change is that staged writes no longer validate the index is
sorted. The validation is done at the moment
compact_incompletes/finalize_staged_data/sort_and_finalize_staged_data
is called. This is because sort_and_finalize_staged_data does not
require the segments to be sorted, but the call for adding a staged
segment is the same. We should add a separate call for that.

Note also that all incomplete keys for a symbol are deleted if any of
the finalize calls fail. The other option is to leave the segments. In
that case the user will have the responsibility of calling
`delete_staged_data`.

<details>
  <summary>
   Checklist for code changes...
  </summary>

- [ ] Have you updated the relevant docstrings, documentation and
copyright notice?
- [ ] Is this contribution tested against [all ArcticDB's
features](../docs/mkdocs/docs/technical/contributing.md)?
- [ ] Do all exceptions introduced raise appropriate [error
messages](https://docs.arcticdb.io/error_messages/)?
 - [ ] Are API changes highlighted in the PR description?
- [ ] Is the PR labelled as enhancement or bug so it appears in
autogenerated release notes?
</details>

<!--
Thanks for contributing a Pull Request to ArcticDB! Please ensure you
have taken a look at:
- ArcticDB's Code of Conduct:
https://github.com/man-group/ArcticDB/blob/master/CODE_OF_CONDUCT.md
- ArcticDB's Contribution Licensing:
https://github.com/man-group/ArcticDB/blob/master/docs/mkdocs/docs/technical/contributing.md#contribution-licensing
-->

---------

Co-authored-by: Vasil Pashov <vasil.pashov@man.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant