Skip to content

Commit

Permalink
Load ArraySchema in parallel to listing fragments
Browse files Browse the repository at this point in the history
For `array_open_for_reads` we can list the fragments in parallel to
loading the array schema. Listing the fragments and loading the fragment
metadata does not require the array schema. Loading everything in
parallel can save 100-300 milliseconds in the open time for S3 based
arrays.
  • Loading branch information
Shelnutt2 committed Jan 29, 2021
1 parent 6130740 commit 1246e41
Show file tree
Hide file tree
Showing 2 changed files with 35 additions and 14 deletions.
1 change: 1 addition & 0 deletions HISTORY.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@
* Support for dimension/attribute names that contain commonly reserved filesystem characters [#2047](https://github.com/TileDB-Inc/TileDB/pull/2047)
* Remove unnecessary `is_dir` in `FragmentMetadata::store`, this can increase performance for s3 writes [#2050](https://github.com/TileDB-Inc/TileDB/pull/2050)
* Improve S3 multipart locking [#2055](https://github.com/TileDB-Inc/TileDB/pull/2055)
* Parallize loading fragments and array schema [#2061](https://github.com/TileDB-Inc/TileDB/pull/2061)

## Deprecations

Expand Down
48 changes: 34 additions & 14 deletions tiledb/sm/storage_manager/storage_manager.cc
Original file line number Diff line number Diff line change
Expand Up @@ -203,14 +203,18 @@ Status StorageManager::array_open_for_reads(
std::vector<FragmentMetadata*>* fragment_metadata) {
STATS_START_TIMER(stats::Stats::TimerType::READ_ARRAY_OPEN)

// Open array without fragments
// Open array without fragments async. This loads the array schema which
// is not needed in this function, so it can safely be loaded in parallel
// to listing of the fragment metadata and loading the consolidated fragment
// metadata file
auto open_array = (OpenArray*)nullptr;
RETURN_NOT_OK_ELSE(
array_open_without_fragments(array_uri, enc_key, &open_array),
*array_schema = nullptr);

// Retrieve array schema
*array_schema = open_array->array_schema();
std::vector<ThreadPool::Task> load_array_schema_task;
load_array_schema_task.emplace_back(io_tp_->execute([&, this]() {
RETURN_NOT_OK_ELSE(
array_open_without_fragments(array_uri, enc_key, &open_array),
*array_schema = nullptr);
return Status::Ok();
}));

// Determine which fragments to load
std::vector<TimestampedURI> fragments_to_load;
Expand All @@ -226,6 +230,12 @@ Status StorageManager::array_open_for_reads(
RETURN_NOT_OK(load_consolidated_fragment_meta(
meta_uri, enc_key, &f_buff, &offsets, &meta_version));

// Wait for array schema to be loaded
RETURN_NOT_OK(io_tp_->wait_all(load_array_schema_task));

// Retrieve array schema
*array_schema = open_array->array_schema();

// Get fragment metadata in the case of reads, if not fetched already
Status st = load_fragment_metadata(
open_array,
Expand Down Expand Up @@ -259,14 +269,18 @@ Status StorageManager::array_open_for_reads(
std::vector<FragmentMetadata*>* fragment_metadata) {
STATS_START_TIMER(stats::Stats::TimerType::READ_ARRAY_OPEN)

// Open array without fragments
// Open array without fragments async. This loads the array schema which
// is not needed in this function, so it can safely be loaded in parallel
// to listing of the fragment metadata and loading the consolidated fragment
// metadata file
auto open_array = (OpenArray*)nullptr;
RETURN_NOT_OK_ELSE(
array_open_without_fragments(array_uri, enc_key, &open_array),
*array_schema = nullptr);

// Retrieve array schema
*array_schema = open_array->array_schema();
std::vector<ThreadPool::Task> load_array_schema_task;
load_array_schema_task.emplace_back(io_tp_->execute([&, this]() {
RETURN_NOT_OK_ELSE(
array_open_without_fragments(array_uri, enc_key, &open_array),
*array_schema = nullptr);
return Status::Ok();
}));

// Determine which fragments to load
std::vector<TimestampedURI> fragments_to_load;
Expand All @@ -287,6 +301,12 @@ Status StorageManager::array_open_for_reads(
RETURN_NOT_OK(load_consolidated_fragment_meta(
meta_uri, enc_key, &f_buff, &offsets, &meta_version));

// Wait for array schema to be loaded
RETURN_NOT_OK(io_tp_->wait_all(load_array_schema_task));

// Retrieve array schema
*array_schema = open_array->array_schema();

// Get fragment metadata in the case of reads, if not fetched already
Status st = load_fragment_metadata(
open_array,
Expand Down

0 comments on commit 1246e41

Please sign in to comment.