Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: add cuda backend support for to_raggedtensor and from_raggedtensor functions #3263

Conversation

maxymnaumchyk
Copy link
Collaborator

@maxymnaumchyk maxymnaumchyk commented Oct 1, 2024

No description provided.

Copy link

codecov bot commented Oct 1, 2024

Codecov Report

Attention: Patch coverage is 12.24490% with 43 lines in your changes missing coverage. Please review.

Project coverage is 82.17%. Comparing base (b749e49) to head (5ee7f0c).
Report is 179 commits behind head on main.

Files with missing lines Patch % Lines
src/awkward/operations/ak_to_raggedtensor.py 13.33% 26 Missing ⚠️
src/awkward/operations/ak_from_raggedtensor.py 10.52% 17 Missing ⚠️
Additional details and impacted files
Files with missing lines Coverage Δ
src/awkward/operations/ak_from_raggedtensor.py 22.72% <10.52%> (ø)
src/awkward/operations/ak_to_raggedtensor.py 21.81% <13.33%> (ø)

... and 157 files with indirect coverage changes

@maxymnaumchyk
Copy link
Collaborator Author

maxymnaumchyk commented Oct 1, 2024

@jpivarski while trying to make the to_raggedtensor function keep the device of the original awkward array I stumbled upon an issue. The thing is, tensorflow automatically selects gpu for computation, if it's available. And if I try to run the following code on gpu, it does return a tensor on cpu:

import tensorflow as tf

def function():
    with tf.device('CPU:0'):
        a = tf.constant([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
        return a

a = function()
a.device

>>/job:localhost/replica:0/task:0/device:CPU:0

However if try to do the same with the to_raggedtensor function, the intermediate ragged tensor is allocated on cpu (the line 78 print says that it's on cpu) but the resulting tensor is allocated on gpu:

to_raggedtensor(ak.Array([[[1.1, 2.2], [3.3]], [], [[4.4, 5.5]]]))[0][0].device
>>/job:localhost/replica:0/task:0/device:GPU:0

Should I make the function use a TensorFlow policy and automatically select a device or create some kind of workaround?

@jpivarski
Copy link
Member

ak.to_raggedtensor should return a RaggedTensor on the same device as the Awkward Array, as a view (no copy) if possible. That may mean that the implementation needs to specify non-default arguments of the RaggedTensor constructor (or use the with block) in order to control it.

If this is not possible and TensorFlow returns an object whose backend depends on what hardware is available (a terrible practice! shame on TensorFlow!), then we'll have to explain that (apologetically) in our documentation.

@maxymnaumchyk maxymnaumchyk marked this pull request as ready for review October 16, 2024 15:09
Copy link
Member

@jpivarski jpivarski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking good! I added some possible changes—actually, "things to think about" because you know the TensorFlow situation better than I do.

This could also use tests. Would it be sufficient to copy the to/from raggedtensor tests from the tests/ directory to tests-cuda/ and replace NumPy arrays with CuPy arrays?

Just as you can run the normal tests with

python -m pytest tests

you can run the CUDA tests with

python -m pytest tests-cuda

on a computer with an Nvidia GPU.

src/awkward/operations/ak_from_raggedtensor.py Outdated Show resolved Hide resolved
src/awkward/operations/ak_from_raggedtensor.py Outdated Show resolved Hide resolved
Copy link
Member

@jpivarski jpivarski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good! Except maybe for the case of more than 10 GPUs: see below. Once that's fixed, this would be ready to merge.

src/awkward/operations/ak_from_raggedtensor.py Outdated Show resolved Hide resolved
@maxymnaumchyk maxymnaumchyk merged commit c7ebd58 into scikit-hep:main Oct 28, 2024
45 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants