Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aggregate reads & writes in disk_io #1205

Merged
merged 12 commits into from
Jun 29, 2023

Conversation

jakirkham
Copy link
Member

Follow up to this discussion ( #925 (comment) )

  • Preallocates buffers before reading
  • Uses NumPy uint8 arrays for all host memory (benefits from hugepages on transfers)
  • Handles IO asynchronously with KvikIO and waits at the end
  • Uses vectorized IO for host reads & writes

@github-actions github-actions bot added the python python code needed label Jun 28, 2023
@jakirkham jakirkham force-pushed the use_kvikio_futures branch 4 times, most recently from e718585 to aa6007e Compare June 28, 2023 03:42
@jakirkham jakirkham added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Jun 28, 2023
@jakirkham jakirkham changed the title [WIP] Aggregate reads & writes in disk_io Aggregate reads & writes in disk_io Jun 28, 2023
@jakirkham jakirkham marked this pull request as ready for review June 28, 2023 03:50
@jakirkham jakirkham requested a review from a team as a code owner June 28, 2023 03:50
Copy link
Member Author

@jakirkham jakirkham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noted a few opportunities for simplification below using proposed upstream features

dask_cuda/disk_io.py Outdated Show resolved Hide resolved
dask_cuda/disk_io.py Outdated Show resolved Hide resolved
dask_cuda/disk_io.py Outdated Show resolved Hide resolved
dask_cuda/disk_io.py Outdated Show resolved Hide resolved
@jakirkham jakirkham force-pushed the use_kvikio_futures branch 2 times, most recently from 8071ccf to d72e98e Compare June 28, 2023 07:00
dask_cuda/disk_io.py Outdated Show resolved Hide resolved
dask_cuda/disk_io.py Outdated Show resolved Hide resolved
@jakirkham jakirkham force-pushed the use_kvikio_futures branch 2 times, most recently from de6806f to e2c24e6 Compare June 28, 2023 07:12
@jakirkham jakirkham requested a review from madsbk June 28, 2023 07:15
Ensure that all buffers are backed by `uint8` NumPy arrays. This way
they use hugepages when available and can be transferred to the GPU
faster.
Before doing more expensive IO operations, allocate all memory needed in
one go. This way it can be fed into the IO operations rapidly or be batched.
Simplify allocating memory for reading by using one `list`-comprehension
for all cases.
Instead of writing each buffer one at a time, write all of them together
in one call with `writelines`. This should result in fewer releases of
the GIL and fewer kernel traps.
Handle reading and writing multiple buffers in one go with lower level
OS primitives. Since Dask-CUDA is Linux only, can just rely on `readv`
and `writev`. This way both reading *and* writing can be done in one
call.
Instead of waiting on each `FutureIO` from KvikIO before creating the
next one, submit all IO requests and then get each one at the end.
dask_cuda/disk_io.py Outdated Show resolved Hide resolved
@jakirkham
Copy link
Member Author

Thanks Mads! 🙏

Please let me know if anything else is needed here 🙂

Copy link
Member

@madsbk madsbk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks @jakirkham

Copy link
Member

@pentschev pentschev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @jakirkham and @madsbk !

@pentschev
Copy link
Member

/merge

@rapids-bot rapids-bot bot merged commit a19ef43 into rapidsai:branch-23.08 Jun 29, 2023
16 checks passed
@jakirkham jakirkham deleted the use_kvikio_futures branch June 29, 2023 19:52
@jakirkham
Copy link
Member Author

Thank you both! 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement Improvement / enhancement to an existing function non-breaking Non-breaking change python python code needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants