-
Notifications
You must be signed in to change notification settings - Fork 199
chore: Add global order min/max per tile to the fragment metadata #5646
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…metdata-global-order-bounds
…to load default profile in ordinary execution
…rlapping single-tile fragment section
|
Before merging this PR, estimate impact of it on this other issue. Will we make that problem worse? |
…metdata-global-order-bounds
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't have APIs that let users access other tile metadata, and I'm not sure we should add them, mainly because they exist only in new fragment metadata versions. I would be open to changing my mind, if there is an end-to-end scenario for these APIs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These run alongside the various mbr APIs, e.g. tiledb_fragment_info_get_mbr_from_index.
The end-to-end scenario which I am interested in is enabling these to be queried using tiledb-tables as a potential support tool.
| [[nodiscard]] constexpr inline const void* content() const { | ||
| return datum_content_; | ||
| } | ||
| [[nodiscard]] inline size_t size() const { | ||
| [[nodiscard]] constexpr inline size_t size() const { | ||
| return datum_size_; | ||
| } | ||
| template <class T> | ||
| [[nodiscard]] inline const T& value_as() const { | ||
| [[nodiscard]] constexpr inline const T& value_as() const { | ||
| return *static_cast<const T*>(datum_content_); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: the [[nodiscard]]s look a bit excessive if you asked me, but I'm OK with keeping them.
format_spec/fragment.md
Outdated
| | Tile global order min coordinates offset for dimension 1 | `uint64_t` | _New in version 23_ For sparse arrays, he offset to the generic tile storing the tile global order mins for dimension 1 | ||
| | … | … | … | | ||
| | Tile global order min coordinates offset for dimension N | `uint64_t` | _New in version 23_ For sparse arrays, the offset to the generic tile storing the tile global order mins for dimension N | ||
| | Tile global order max coordinates offset for dimension 1 | `uint64_t` | _New in version 23_ For sparse arrays, the offset to the generic tile storing the tile global order maxes for dimension 1 | ||
| | … | … | … | | ||
| | Tile global order max coordinates offset for dimension N | `uint64_t` | _New in version 23_ For sparse arrays, the offset to the generic tile storing the tile global order maxes for dimension N |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- I would prefer phrasing it like
The offset to the […] for dimension N. This field exists only on sparse arrays.. - We have yet to decide whether this is how we want to store the tiles, or use the extensible footer design described in CORE-345. If we do the latter, and don't want to change this PR's implementation, one proposed solution was to do it in a subsequent PR that will be merged before the next release.
- The fact that these fields are only written in the footer for sparse arrays, is another reason to pursue the extensible footer design.
- If we stick with this design, a better and simpler idea would be to always the offsets, and write zeroes for dense arrays.
- The fact that these fields are only written in the footer for sparse arrays, is another reason to pursue the extensible footer design.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have yet to decide whether this is how we want to store the tiles, or use the extensible footer design described in CORE-345
Each time this has been brought up in the meeting the overall consensus has been (1) yes, we do want to implement a more extensible footer design; (2) no, we do not want to hold up existing work so that we can do that. It is settled.
The fact that these fields are only written in the footer for sparse arrays, is another reason to pursue the extensible footer design
I agree with this, it is rather clunky. However see above.
If we stick with this design, a better and simpler idea would be to always the offsets, and write zeroes for dense arrays
It removes a few ifs. Theoretically it might also for some other developer who wanted to read the storage format. I don't really have a preference.
) This pull request does some refactoring and enhancing of our test data structures `struct Fragment1D` and `struct Fragment2D` which are used as containers for inputs to write queries and outputs of read queries. 1) we move the methods which generically create arrays using these and write fragments using these to a header file instead of `sparse_global_order_reader.cc` 2) we refactor functionality into a `template <typename DimensionTuple, typename AttributeTuple> struct Fragment` which is a common base class for `Fragment1D`, `Fragment2D`, a new `Fragment3D` and can be used with an empty dimension tuple for dense fragments 3) we implement various fixes to enable variable-length dimensions 4) some other incidentals This pull request has been refactored out of #5646 for two reasons: 1) it is all enhancements to test code which may be distracting from the functional components of #5646 ; 2) I anticipate using it to add additional tests to #5655 . Specifically, using the `Fragment` with an empty dimension tuple will be very nice for writing tests on dense arrays. There's no direct testing of this, but since the code was factored out of #5646 all of it was used in testing there. Many components are exercised due to the refactor of the existing tests. --- TYPE: NO_HISTORY DESC: refactors and enhancements to test data structure `struct Fragment`
…metdata-global-order-bounds
) This pull request does some refactoring and enhancing of our test data structures `struct Fragment1D` and `struct Fragment2D` which are used as containers for inputs to write queries and outputs of read queries. 1) we move the methods which generically create arrays using these and write fragments using these to a header file instead of `sparse_global_order_reader.cc` 2) we refactor functionality into a `template <typename DimensionTuple, typename AttributeTuple> struct Fragment` which is a common base class for `Fragment1D`, `Fragment2D`, a new `Fragment3D` and can be used with an empty dimension tuple for dense fragments 3) we implement various fixes to enable variable-length dimensions 4) some other incidentals This pull request has been refactored out of #5646 for two reasons: 1) it is all enhancements to test code which may be distracting from the functional components of #5646 ; 2) I anticipate using it to add additional tests to #5655 . Specifically, using the `Fragment` with an empty dimension tuple will be very nice for writing tests on dense arrays. There's no direct testing of this, but since the code was factored out of #5646 all of it was used in testing there. Many components are exercised due to the refactor of the existing tests. --- TYPE: NO_HISTORY DESC: refactors and enhancements to test data structure `struct Fragment`
…metdata-global-order-bounds
) This pull request does some refactoring and enhancing of our test data structures `struct Fragment1D` and `struct Fragment2D` which are used as containers for inputs to write queries and outputs of read queries. 1) we move the methods which generically create arrays using these and write fragments using these to a header file instead of `sparse_global_order_reader.cc` 2) we refactor functionality into a `template <typename DimensionTuple, typename AttributeTuple> struct Fragment` which is a common base class for `Fragment1D`, `Fragment2D`, a new `Fragment3D` and can be used with an empty dimension tuple for dense fragments 3) we implement various fixes to enable variable-length dimensions 4) some other incidentals This pull request has been refactored out of #5646 for two reasons: 1) it is all enhancements to test code which may be distracting from the functional components of #5646 ; 2) I anticipate using it to add additional tests to #5655 . Specifically, using the `Fragment` with an empty dimension tuple will be very nice for writing tests on dense arrays. There's no direct testing of this, but since the code was factored out of #5646 all of it was used in testing there. Many components are exercised due to the refactor of the existing tests. --- TYPE: NO_HISTORY DESC: refactors and enhancements to test data structure `struct Fragment`
) This pull request does some refactoring and enhancing of our test data structures `struct Fragment1D` and `struct Fragment2D` which are used as containers for inputs to write queries and outputs of read queries. 1) we move the methods which generically create arrays using these and write fragments using these to a header file instead of `sparse_global_order_reader.cc` 2) we refactor functionality into a `template <typename DimensionTuple, typename AttributeTuple> struct Fragment` which is a common base class for `Fragment1D`, `Fragment2D`, a new `Fragment3D` and can be used with an empty dimension tuple for dense fragments 3) we implement various fixes to enable variable-length dimensions 4) some other incidentals This pull request has been refactored out of #5646 for two reasons: 1) it is all enhancements to test code which may be distracting from the functional components of #5646 ; 2) I anticipate using it to add additional tests to #5655 . Specifically, using the `Fragment` with an empty dimension tuple will be very nice for writing tests on dense arrays. There's no direct testing of this, but since the code was factored out of #5646 all of it was used in testing there. Many components are exercised due to the refactor of the existing tests. --- TYPE: NO_HISTORY DESC: refactors and enhancements to test data structure `struct Fragment`
Resolves CORE-321.
Today's fragment metadata contains the minimum bounding rectangle of each tile. This is very useful for determining whether tiles can satisfy spatial queries, but much less useful for determining how different tiles from different fragments may or may not interleave (knowledge of which can be used to implement several possible optimizations). This is because the bounding rectangle can under-estimate the true lower bound of the global order value in a tile, potentially by a lot.
We would like for our queries to start to leverage the interleaving of tiles from different fragments in the ways described in that linked document. First we must extend the fragment metadata to include the tile global order bounds. This pull request implements that:
TYPE: C_API | CPP_API | FORMAT
DESC: Add per-tile global order bounds to fragment metadata