Skip to content

Conversation

@rroelke
Copy link
Member

@rroelke rroelke commented Sep 22, 2025

Resolves CORE-321.

Today's fragment metadata contains the minimum bounding rectangle of each tile. This is very useful for determining whether tiles can satisfy spatial queries, but much less useful for determining how different tiles from different fragments may or may not interleave (knowledge of which can be used to implement several possible optimizations). This is because the bounding rectangle can under-estimate the true lower bound of the global order value in a tile, potentially by a lot.

We would like for our queries to start to leverage the interleaving of tiles from different fragments in the ways described in that linked document. First we must extend the fragment metadata to include the tile global order bounds. This pull request implements that:

  • we update the fragment metadata format version and specification to include a section with the global order lower and upper bounds for each tile in a fragment.
  • we add C and C++ APIs to query those bounds.
  • we update the writers to populate the correct values in fragment metadata and test this for the unordered writer, global order writer, and fragment consoliator.

TYPE: C_API | CPP_API | FORMAT
DESC: Add per-tile global order bounds to fragment metadata

rroelke added 30 commits July 21, 2025 13:55
…to load default profile in ordinary execution
@rroelke rroelke marked this pull request as draft September 22, 2025 18:06
@rroelke
Copy link
Member Author

rroelke commented Sep 22, 2025

RE: CORE-383

Before merging this PR, estimate impact of it on this other issue. Will we make that problem worse?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have APIs that let users access other tile metadata, and I'm not sure we should add them, mainly because they exist only in new fragment metadata versions. I would be open to changing my mind, if there is an end-to-end scenario for these APIs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These run alongside the various mbr APIs, e.g. tiledb_fragment_info_get_mbr_from_index.

The end-to-end scenario which I am interested in is enabling these to be queried using tiledb-tables as a potential support tool.

Comment on lines 53 to 62
[[nodiscard]] constexpr inline const void* content() const {
return datum_content_;
}
[[nodiscard]] inline size_t size() const {
[[nodiscard]] constexpr inline size_t size() const {
return datum_size_;
}
template <class T>
[[nodiscard]] inline const T& value_as() const {
[[nodiscard]] constexpr inline const T& value_as() const {
return *static_cast<const T*>(datum_content_);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: the [[nodiscard]]s look a bit excessive if you asked me, but I'm OK with keeping them.

Comment on lines 285 to 290
| Tile global order min coordinates offset for dimension 1 | `uint64_t` | _New in version 23_ For sparse arrays, he offset to the generic tile storing the tile global order mins for dimension 1
||||
| Tile global order min coordinates offset for dimension N | `uint64_t` | _New in version 23_ For sparse arrays, the offset to the generic tile storing the tile global order mins for dimension N
| Tile global order max coordinates offset for dimension 1 | `uint64_t` | _New in version 23_ For sparse arrays, the offset to the generic tile storing the tile global order maxes for dimension 1
||||
| Tile global order max coordinates offset for dimension N | `uint64_t` | _New in version 23_ For sparse arrays, the offset to the generic tile storing the tile global order maxes for dimension N
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • I would prefer phrasing it like The offset to the […] for dimension N. This field exists only on sparse arrays..
  • We have yet to decide whether this is how we want to store the tiles, or use the extensible footer design described in CORE-345. If we do the latter, and don't want to change this PR's implementation, one proposed solution was to do it in a subsequent PR that will be merged before the next release.
    • The fact that these fields are only written in the footer for sparse arrays, is another reason to pursue the extensible footer design.
      • If we stick with this design, a better and simpler idea would be to always the offsets, and write zeroes for dense arrays.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have yet to decide whether this is how we want to store the tiles, or use the extensible footer design described in CORE-345

Each time this has been brought up in the meeting the overall consensus has been (1) yes, we do want to implement a more extensible footer design; (2) no, we do not want to hold up existing work so that we can do that. It is settled.

The fact that these fields are only written in the footer for sparse arrays, is another reason to pursue the extensible footer design

I agree with this, it is rather clunky. However see above.

If we stick with this design, a better and simpler idea would be to always the offsets, and write zeroes for dense arrays

It removes a few ifs. Theoretically it might also for some other developer who wanted to read the storage format. I don't really have a preference.

rroelke added a commit that referenced this pull request Oct 31, 2025
)

This pull request does some refactoring and enhancing of our test data
structures `struct Fragment1D` and `struct Fragment2D` which are used as
containers for inputs to write queries and outputs of read queries.
1) we move the methods which generically create arrays using these and
write fragments using these to a header file instead of
`sparse_global_order_reader.cc`
2) we refactor functionality into a `template <typename DimensionTuple,
typename AttributeTuple> struct Fragment` which is a common base class
for `Fragment1D`, `Fragment2D`, a new `Fragment3D` and can be used with
an empty dimension tuple for dense fragments
3) we implement various fixes to enable variable-length dimensions
4) some other incidentals

This pull request has been refactored out of #5646 for two reasons:
1) it is all enhancements to test code which may be distracting from the
functional components of #5646 ;
2) I anticipate using it to add additional tests to #5655 .
Specifically, using the `Fragment` with an empty dimension tuple will be
very nice for writing tests on dense arrays.

There's no direct testing of this, but since the code was factored out
of #5646 all of it was used in testing there. Many components are
exercised due to the refactor of the existing tests.

---
TYPE: NO_HISTORY
DESC: refactors and enhancements to test data structure `struct
Fragment`
rroelke added a commit that referenced this pull request Nov 5, 2025
)

This pull request does some refactoring and enhancing of our test data
structures `struct Fragment1D` and `struct Fragment2D` which are used as
containers for inputs to write queries and outputs of read queries.
1) we move the methods which generically create arrays using these and
write fragments using these to a header file instead of
`sparse_global_order_reader.cc`
2) we refactor functionality into a `template <typename DimensionTuple,
typename AttributeTuple> struct Fragment` which is a common base class
for `Fragment1D`, `Fragment2D`, a new `Fragment3D` and can be used with
an empty dimension tuple for dense fragments
3) we implement various fixes to enable variable-length dimensions
4) some other incidentals

This pull request has been refactored out of #5646 for two reasons:
1) it is all enhancements to test code which may be distracting from the
functional components of #5646 ;
2) I anticipate using it to add additional tests to #5655 .
Specifically, using the `Fragment` with an empty dimension tuple will be
very nice for writing tests on dense arrays.

There's no direct testing of this, but since the code was factored out
of #5646 all of it was used in testing there. Many components are
exercised due to the refactor of the existing tests.

---
TYPE: NO_HISTORY
DESC: refactors and enhancements to test data structure `struct
Fragment`
rroelke added a commit that referenced this pull request Nov 5, 2025
)

This pull request does some refactoring and enhancing of our test data
structures `struct Fragment1D` and `struct Fragment2D` which are used as
containers for inputs to write queries and outputs of read queries.
1) we move the methods which generically create arrays using these and
write fragments using these to a header file instead of
`sparse_global_order_reader.cc`
2) we refactor functionality into a `template <typename DimensionTuple,
typename AttributeTuple> struct Fragment` which is a common base class
for `Fragment1D`, `Fragment2D`, a new `Fragment3D` and can be used with
an empty dimension tuple for dense fragments
3) we implement various fixes to enable variable-length dimensions
4) some other incidentals

This pull request has been refactored out of #5646 for two reasons:
1) it is all enhancements to test code which may be distracting from the
functional components of #5646 ;
2) I anticipate using it to add additional tests to #5655 .
Specifically, using the `Fragment` with an empty dimension tuple will be
very nice for writing tests on dense arrays.

There's no direct testing of this, but since the code was factored out
of #5646 all of it was used in testing there. Many components are
exercised due to the refactor of the existing tests.

---
TYPE: NO_HISTORY
DESC: refactors and enhancements to test data structure `struct
Fragment`
ihnorton pushed a commit that referenced this pull request Nov 6, 2025
)

This pull request does some refactoring and enhancing of our test data
structures `struct Fragment1D` and `struct Fragment2D` which are used as
containers for inputs to write queries and outputs of read queries.
1) we move the methods which generically create arrays using these and
write fragments using these to a header file instead of
`sparse_global_order_reader.cc`
2) we refactor functionality into a `template <typename DimensionTuple,
typename AttributeTuple> struct Fragment` which is a common base class
for `Fragment1D`, `Fragment2D`, a new `Fragment3D` and can be used with
an empty dimension tuple for dense fragments
3) we implement various fixes to enable variable-length dimensions
4) some other incidentals

This pull request has been refactored out of #5646 for two reasons:
1) it is all enhancements to test code which may be distracting from the
functional components of #5646 ;
2) I anticipate using it to add additional tests to #5655 .
Specifically, using the `Fragment` with an empty dimension tuple will be
very nice for writing tests on dense arrays.

There's no direct testing of this, but since the code was factored out
of #5646 all of it was used in testing there. Many components are
exercised due to the refactor of the existing tests.

---
TYPE: NO_HISTORY
DESC: refactors and enhancements to test data structure `struct
Fragment`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants