Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add hack to handle tf not recognizing bool dtype in dlpack #276

Merged
merged 1 commit into from
Apr 7, 2023

Conversation

jperez999
Copy link
Collaborator

@jperez999 jperez999 commented Apr 7, 2023

This PR adds a fix for tensorflow dlpack translation that does not allow bool dtypes. Because of this we have opted to cast all boolean columns as int8 dtype which is allowed by tensorflow. Also adds newest cupy version to the requirements for GPU version of core, needed to ensure proper implementation of dlpack is used in tensortable. For reference: https://github.com/tensorflow/tensorflow/blob/a9e6bd9ca4c0dbc96a71a1331bccf39a10757148/tensorflow/c/eager/dlpack.cc#L161-L242

@jperez999 jperez999 self-assigned this Apr 7, 2023
@jperez999 jperez999 requested review from karlhigley and oliverholworthy and removed request for karlhigley April 7, 2023 01:13
@jperez999 jperez999 added bug Something isn't working clean up ci chore Maintenance for the repository labels Apr 7, 2023
@jperez999 jperez999 added this to the Merlin 23.04 milestone Apr 7, 2023
Copy link
Member

@oliverholworthy oliverholworthy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be worth adding a test for this conversion to demonstrate support for converting bool types

def _to_dlpack_from_tf_tensor(tensor):
def _to_dlpack_from_cp_tensor(tensor):
if tensor.dtype == cp.dtype("bool"):
tensor = tensor.astype(cp.dtype("int8"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need a corresponding coercion back to bool dtypes on the other side (the from dlpack functions?)

Copy link
Collaborator Author

@jperez999 jperez999 Apr 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No we dont, we have seen in testing that the models seem to be ok with the conversion to 1 and 0 and they are able to interpret those values correctly. We were originally going to make that change also (it belongs in the tensor column constructor) but during testing we saw that is was not necessary. If it ever does become necessary we can absolutely add it in.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not 100% sure these values are being interpreted as bools on the other side (vs being treated as int8 and having that still function for model training.) I'm not sure if it matters though, and I'd rather avoid an extra conversion if we can for performance reasons. So...I'm on the fence about adding the return conversion. "Let's see if it turns out to be a problem not to have it" seems reasonable, I suppose.

@jperez999 jperez999 requested a review from karlhigley April 7, 2023 14:38
Copy link
Contributor

@karlhigley karlhigley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to specify a minimum dependency version for CuPy, and I'm not sure if we want to create a hard dependency on CUDA 11 since other toolkit versions also have CuPy 12.0.0 available. Seems like it might be better to specify the CuPy version we want as something like cupy>=12.0.0

@@ -3,4 +3,4 @@
# cudf>=21.12
# dask-cudf>=21.12
# dask-cuda>=21.12
# cupy>=7
cupy-cuda11x
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the version does matter here, since CuPy didn't add support for bools with DLpack until 12.0.0

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We cant, when you try to specify that here it fails... When I try something like `pip install cupy-cuda11x>=12.0.0 it silently fails and nothing gets installed. This is after I have previously uninstalled the version in the environment. This is why I just left it to grab the latest version.
image

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you end up being forced to run the plain pip install cupy-cuda11x and you get the latest version.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we want to install cupy-cuda11x specifically though. In the containers we do, but here the requirement is actually to have cupy>=12.0.0—which we will by installing cupy-cuda11x, but would also be satisfied by cupy-cuda12x, cupy-cuda102, etc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I'm wrong about that. pip install cupy>=12.0.0 still tries to install from source even if you have cupy-cuda*>=12.0.0. Maybe we could make the various packages for satisfying the cupy dependency optional instead?

def _to_dlpack_from_tf_tensor(tensor):
def _to_dlpack_from_cp_tensor(tensor):
if tensor.dtype == cp.dtype("bool"):
tensor = tensor.astype(cp.dtype("int8"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not 100% sure these values are being interpreted as bools on the other side (vs being treated as int8 and having that still function for model training.) I'm not sure if it matters though, and I'd rather avoid an extra conversion if we can for performance reasons. So...I'm on the fence about adding the return conversion. "Let's see if it turns out to be a problem not to have it" seems reasonable, I suppose.

@karlhigley karlhigley merged commit fce0366 into NVIDIA-Merlin:main Apr 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants