Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ready event to Tensor and TensorList. #5673

Merged
merged 5 commits into from
Oct 14, 2024

Conversation

mzient
Copy link
Contributor

@mzient mzient commented Oct 10, 2024

Category:

New feature (non-breaking change which adds functionality)
Refactoring (Redesign of existing code that doesn't affect functionality)

Description:

Preliminary work for (cleaner) DLPack support.
DLPack needs to synchronize an stream so that the tensor is ready for use in that stream.
We can't use stream-to-stream synchronization because of prefetching. This PR adds ready_event to Tensor and TensorList to address that.

  • move SharedEventLease to core
  • add more complete shared_ptr interface to SharedEventLease
  • add tests for SharedEventLease
  • add (set_)ready_event to Tensor and TensorList
  • minor refactoring in TensorList
  • remove OperatorIO::event in favor of TensorList's ready_event in exec2

Additional information:

Affected modules and functionalities:

Executor2
Tensor
TensorList

Key points relevant for the review:

Tests:

  • Existing tests apply
  • New tests added
    • Python tests
    • GTests
    • Benchmark
    • Other
  • N/A

Checklist

Documentation

  • Existing documentation applies
  • Documentation updated
    • Docstring
    • Doxygen
    • RST
    • Jupyter
    • Other
  • N/A

DALI team only

Requirements

  • Implements new requirements
  • Affects existing requirements
  • N/A

REQ IDs: N/A

JIRA TASK: N/A

- move SharedEventLease to core
- add more complete shared_ptr interface to SharedEventLease
- add tests for SharedEventLease
- add (set_)ready_event to Tensor and TensorList
- minor refactoring in TensorList
- remove OperatorIO::event in favor of TensorList's ready_event in exec2

Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
@dali-automaton
Copy link
Collaborator

CI MESSAGE: [19219483]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [19219483]: BUILD FAILED

Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
@dali-automaton
Copy link
Collaborator

CI MESSAGE: [19220893]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [19220893]: BUILD PASSED

Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
@dali-automaton
Copy link
Collaborator

CI MESSAGE: [19248455]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [19248455]: BUILD PASSED

Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
@dali-automaton
Copy link
Collaborator

CI MESSAGE: [19327083]: BUILD STARTED

if (!ptr->shares_data()) {
if (AccessOrder consumer_order = OutputConsumerStream(o))
ptr->set_order(consumer_order, false);
if (ptr->is_pinned()) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fixes a bug - previously we would set the consumer order for non-pinned host buffers - this PR adds an assert that verifies that such buffers are always in host order.

Comment on lines +121 to +124
// Hack: use shared_ptr<void> to store a CUDA event - shared_ptr doesn't care whether the pointer
// it manages is a real pointer or something else as long as:
// - null value is equivalent to nullptr
// - the provided deleter can free the object.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to make sure: is this guaranteed by the standard, or is it an implementation detail of shared_ptr?

Copy link
Contributor Author

@mzient mzient Oct 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's put it differently: we've already been using it this way for 6 years - we use shared_ptr to manage device memory, which is also non-dereferenceble on host, so it's effectively an opaque handle.

@mzient mzient merged commit 3db39b1 into NVIDIA:main Oct 14, 2024
5 of 6 checks passed
@dali-automaton
Copy link
Collaborator

CI MESSAGE: [19327083]: BUILD PASSED

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants