Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean up of make_df function #205

Merged
merged 10 commits into from
Feb 13, 2023

Conversation

jperez999
Copy link
Collaborator

This PR combines increases readability of the make_df function while accounting for additional possibilities of input explicitly (cupy array being turned into a pandas df). We need this so that we do not hit the following error anymore when turning a GPU ndarray into a pandas DF: TypeError: Implicit conversion to a NumPy array is not allowed. Please use .get() to construct a NumPy array explicitly.

@jperez999 jperez999 added enhancement New feature or request clean up chore Maintenance for the repository labels Feb 2, 2023
@jperez999 jperez999 added this to the Merlin 23.02 milestone Feb 2, 2023
@jperez999 jperez999 requested a review from karlhigley February 2, 2023 18:32
@jperez999 jperez999 self-assigned this Feb 2, 2023
except ImportError:
HAS_GPU = False
...
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like it might be unrelated to the make_df changes? Is there a reason we need to extract the cupy and rmm imports from the above section?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, there can be... Suppose your on a system that has cupy but does not have cudf. In the current method you find yourself not able to import cupy because cudf failed. And since we dont really require cudf for the use of Systems and this package is a base component for that. I dont think we should be grouping those packages all together. If a package is available it should be able to import.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I misunderstood your comment initially. I think you are highlighting that it doesn't need to be part of this PR. I can see that. I will make a separate PR for those changes. But the motivation behind them, I think, is still valid.

# move to cpu
return _like_df.to_pandas()
if cp and isinstance(_like_df, cp.ndarray):
return pd.DataFrame(cp.asarray(_like_df))
Copy link
Member

@oliverholworthy oliverholworthy Feb 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cupy.asarray returns a cupy array which might not work well with the pandas DataFrame constructor. An additional test for what is expected would help clarify what we'd like to convert from/to.

@jperez999
Copy link
Collaborator Author

The failures in nvtabular come from the shape changes that were introduced.



@pytest.mark.skipif(not cp, reason="Cupy not available")
def test_pandas_cupy_combo():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there intended to be a call to the make_df function in this test?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Originally the idea was to show that the code in make_df was correct. But I will add the make_df call here also. And compare both DFs.

@karlhigley karlhigley merged commit dec2329 into NVIDIA-Merlin:main Feb 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
chore Maintenance for the repository clean up enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants