Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fragment list consolidation for tiledb cloud arrays #5059

Open
wants to merge 6 commits into
base: dev
Choose a base branch
from

Conversation

ypatia
Copy link
Member

@ypatia ypatia commented Jun 7, 2024

[sc-48774]
This PR implements the needed changes on Core side for supporting fragment list cloud consolidation for tiledb:// arrays.
In order for this feature to work end-to-end changes are also needed on the REST server side to deserialize the capnp request and call the right method (consolidate_fragments) for actually doing the consolidation.


TYPE: FEATURE
DESC: Fragment list consolidation for tiledb cloud arrays

@ypatia ypatia force-pushed the yt/sc-48774/fragment_list_consolidation branch from 677f049 to f776d91 Compare June 7, 2024 13:27
throw_if_not_ok(fragment_consolidator->consolidate_fragments(
array_name, encryption_type, encryption_key, key_length, fragment_uris));
// Consolidate
auto fragment_consolidator =
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed dynamic_cast that should be avoided when possible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not do that now. If we're going to do this, let's change all the places that use Consolidator::create for something better and consistent.

Copy link
Member Author

@ypatia ypatia Sep 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why shouldn't we do that immediate improvement? This is the only occurrence of a dynamic_cast in consolidation code, and the reason is that consolidate_fragments is a method of FragmentConsolidator and not of its parent class Consolidator.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After discussing offline, I am reverting this change, although I think it's improving the current situation, to get this long standing PR finally unblocked and merged.

// Check if array exists
ObjectType obj_type;
throw_if_not_ok(
object_type(storage_manager->resources(), array_uri, &obj_type));
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved this in the else case so that it's not executed for remote arrays and cause 2 extra REST requests (is_group and is_array)

@ypatia ypatia force-pushed the yt/sc-48774/fragment_list_consolidation branch from f776d91 to 124ce98 Compare June 7, 2024 13:34
@ypatia ypatia closed this Jun 7, 2024
@ypatia ypatia reopened this Jun 7, 2024
@ypatia ypatia force-pushed the yt/sc-48774/fragment_list_consolidation branch from 124ce98 to 4e092ea Compare June 10, 2024 13:28
@ypatia ypatia force-pushed the yt/sc-48774/fragment_list_consolidation branch from 6b273c4 to 155a23c Compare June 21, 2024 08:43
array_consolidation_request_builder->initFragments(
fragment_uris->size());
for (size_t i = 0; i < fragment_uris->size(); i++) {
fragment_list_builder.set(i, (*fragment_uris)[i]);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

those need to become relative to cross the wire, otherwise it's a security concern. Use serialize_array_uri_to_relative and deserialize_array_uri_to_absolute utility functions

@ypatia ypatia force-pushed the yt/sc-48774/fragment_list_consolidation branch from 155a23c to 3ab70bd Compare July 24, 2024 07:58
@ypatia ypatia marked this pull request as ready for review August 5, 2024 12:33
@ypatia
Copy link
Member Author

ypatia commented Aug 5, 2024

@KiterLuc we should discuss before merging this in case we first need to address this one first: https://app.shortcut.com/tiledb-inc/story/52302/increase-number-of-max-fragments-in-tiledb-fragment-list-consolidation

@ypatia ypatia requested review from shaunrd0 and KiterLuc August 5, 2024 12:35
@@ -113,7 +118,8 @@ array_consolidation_request_from_capnp(
auto fragment_reader = array_consolidation_request_reader.getFragments();
fragment_uris.reserve(fragment_reader.size());
for (const auto& fragment_uri : fragment_reader) {
fragment_uris.emplace_back(fragment_uri);
fragment_uris.emplace_back(deserialize_array_uri_to_absolute(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

REST CI is failing because the path normalization here is not happening on REST since go-capnp handles deserializing the request from generated source code. So we are serializing absolute URIs as relative URIs from core client-side, but REST does not call deserialize_consolidation_request and the switch back to absolute never happens.

We could add a tiledb_handle_array_consolidation_request much like tiledb_handle_load_array_schema_request that would just call deserialize_consolidation_request, but we'll need a TileDB-Go binding and subsequent REST PR to update the route to use it.

Another option without the handler and Go binding is to handle a second type of relative URIs in FragmentConsolidator::consolidate_fragments that is relative to the array_uri/ directory. The relative URIs handled in consolidate_fragments linked above is currently relative to the array_uri/__fragments directory.

Copy link
Member Author

@ypatia ypatia Aug 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about reconstructing the absolute URIs in HandleConsolidateArrayRequest on REST Server side by prepeding the array URI to arrayConsolidationRequest.Fragments before calling ConsolidateFragments ? Would that be possible? Looks like the least "invasive" solution.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened this to track, we discussed and this is a feasible REST-Server side solution: https://app.shortcut.com/tiledb-inc/story/52679/fix-fragment-list-consolidation-handler-to-convert-relative-uris-to-absolute

Let's get this reviewed in the meantime when you have some spare time.

Copy link
Contributor

@shaunrd0 shaunrd0 Aug 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is done and I merged this branch to fix conflicts. I'll watch for CI failures but LGTM 👍

@ypatia ypatia force-pushed the yt/sc-48774/fragment_list_consolidation branch from a4c8c71 to 65c1812 Compare September 4, 2024 11:47
@ypatia
Copy link
Member Author

ypatia commented Sep 4, 2024

@KiterLuc this is now ready for merging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants