Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Load ArraySchema in parallel to listing fragments #2061

Conversation

Shelnutt2
Copy link
Member

For array_open_for_reads we can list the fragments in parallel to loading the array schema. Listing the fragments and loading the fragment metadata does not require the array schema. Loading everything in parallel can save 100-300 milliseconds in the open time for S3 based arrays.

@shortcut-integration
Copy link

@Shelnutt2 Shelnutt2 force-pushed the sethshelnutt/ch5118/parallelize-loading-array-schema-and-listing branch from 26d69c5 to cd55764 Compare January 29, 2021 04:21
@Shelnutt2 Shelnutt2 force-pushed the sethshelnutt/ch5118/parallelize-loading-array-schema-and-listing branch from cd55764 to 213fe27 Compare January 29, 2021 12:29
For `array_open_for_reads` we can list the fragments in parallel to
loading the array schema. Listing the fragments and loading the fragment
metadata does not require the array schema. Loading everything in
parallel can save 100-300 milliseconds in the open time for S3 based
arrays.
@Shelnutt2 Shelnutt2 force-pushed the sethshelnutt/ch5118/parallelize-loading-array-schema-and-listing branch from 213fe27 to 1246e41 Compare January 29, 2021 12:29
@Shelnutt2 Shelnutt2 requested a review from joe-maley January 29, 2021 12:31
@Shelnutt2 Shelnutt2 merged commit ad48c9a into dev Jan 29, 2021
@Shelnutt2 Shelnutt2 deleted the sethshelnutt/ch5118/parallelize-loading-array-schema-and-listing branch January 29, 2021 17:42
ihnorton added a commit that referenced this pull request Feb 5, 2021
Invert the tasking order for async loading established in
  #2061

Now load the fragments in a task and array schema on main.
Fixes several mutex ownership issues which surfaced on Windows.

(1) We need to wait on the async task in all return clauses to avoid
    the following situation which causes an exit on Windows:
    if the context owning the storage manager is destroyed
    without joining the async thread, then we get an error
    on Windows due to attempting to destruct a locked mutex.

(2) We cannot call open_array->mtx_unlock() on the main thread after
    running the array_open call async on the thread pool (which locks
    the open_array.mtx_), because lock/unlock on different threads
    is undefined behavior and a runtime error on windows.
ihnorton added a commit that referenced this pull request Feb 5, 2021
Invert the tasking order for async loading established in
  #2061

Now load the fragments in a task and array schema on main.
Fixes several mutex ownership issues which surfaced on Windows.

(1) We need to wait on the async task in all return clauses to avoid
    the following situation which causes an exit on Windows:
    if the context owning the storage manager is destroyed
    without joining the async thread, then we get an error
    on Windows due to attempting to destruct a locked mutex.

(2) We cannot call open_array->mtx_unlock() on the main thread after
    running the array_open call async on the thread pool (which locks
    the open_array.mtx_), because lock/unlock on different threads
    is undefined behavior and a runtime error on windows.
ihnorton added a commit to ihnorton/TileDB that referenced this pull request Feb 5, 2021
Invert the tasking order for async loading established in
  TileDB-Inc#2061

Now load the fragments in a task and array schema on main.
Fixes several mutex ownership issues which surfaced on Windows.

(1) We need to wait on the async task in all return clauses to avoid
    the following situation which causes an exit on Windows:
    if the context owning the storage manager is destroyed
    without joining the async thread, then we get an error
    on Windows due to attempting to destruct a locked mutex.

(2) We cannot call open_array->mtx_unlock() on the main thread after
    running the array_open call async on the thread pool (which locks
    the open_array.mtx_), because lock/unlock on different threads
    is undefined behavior and a runtime error on windows.
joe-maley pushed a commit that referenced this pull request Feb 8, 2021
* Add test for array open and free with invalid URI

* sm::array_open_for_reads: fetch fragments async instead of schema

Invert the tasking order for async loading established in
  #2061

Now load the fragments in a task and array schema on main.
Fixes several mutex ownership issues which surfaced on Windows.

(1) We need to wait on the async task in all return clauses to avoid
    the following situation which causes an exit on Windows:
    if the context owning the storage manager is destroyed
    without joining the async thread, then we get an error
    on Windows due to attempting to destruct a locked mutex.

(2) We cannot call open_array->mtx_unlock() on the main thread after
    running the array_open call async on the thread pool (which locks
    the open_array.mtx_), because lock/unlock on different threads
    is undefined behavior and a runtime error on windows.
github-actions bot pushed a commit that referenced this pull request Feb 8, 2021
* Add test for array open and free with invalid URI

* sm::array_open_for_reads: fetch fragments async instead of schema

Invert the tasking order for async loading established in
  #2061

Now load the fragments in a task and array schema on main.
Fixes several mutex ownership issues which surfaced on Windows.

(1) We need to wait on the async task in all return clauses to avoid
    the following situation which causes an exit on Windows:
    if the context owning the storage manager is destroyed
    without joining the async thread, then we get an error
    on Windows due to attempting to destruct a locked mutex.

(2) We cannot call open_array->mtx_unlock() on the main thread after
    running the array_open call async on the thread pool (which locks
    the open_array.mtx_), because lock/unlock on different threads
    is undefined behavior and a runtime error on windows.
joe-maley pushed a commit that referenced this pull request Feb 8, 2021
* Add test for array open and free with invalid URI

* sm::array_open_for_reads: fetch fragments async instead of schema

Invert the tasking order for async loading established in
  #2061

Now load the fragments in a task and array schema on main.
Fixes several mutex ownership issues which surfaced on Windows.

(1) We need to wait on the async task in all return clauses to avoid
    the following situation which causes an exit on Windows:
    if the context owning the storage manager is destroyed
    without joining the async thread, then we get an error
    on Windows due to attempting to destruct a locked mutex.

(2) We cannot call open_array->mtx_unlock() on the main thread after
    running the array_open call async on the thread pool (which locks
    the open_array.mtx_), because lock/unlock on different threads
    is undefined behavior and a runtime error on windows.

Co-authored-by: Isaiah Norton <ihnorton@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants