-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clean up of make_df function #205
Clean up of make_df function #205
Conversation
merlin/core/dispatch.py
Outdated
except ImportError: | ||
HAS_GPU = False | ||
... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like it might be unrelated to the make_df
changes? Is there a reason we need to extract the cupy and rmm imports from the above section?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, there can be... Suppose your on a system that has cupy but does not have cudf. In the current method you find yourself not able to import cupy because cudf failed. And since we dont really require cudf for the use of Systems and this package is a base component for that. I dont think we should be grouping those packages all together. If a package is available it should be able to import.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I misunderstood your comment initially. I think you are highlighting that it doesn't need to be part of this PR. I can see that. I will make a separate PR for those changes. But the motivation behind them, I think, is still valid.
merlin/core/dispatch.py
Outdated
# move to cpu | ||
return _like_df.to_pandas() | ||
if cp and isinstance(_like_df, cp.ndarray): | ||
return pd.DataFrame(cp.asarray(_like_df)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cupy.asarray
returns a cupy array which might not work well with the pandas DataFrame constructor. An additional test for what is expected would help clarify what we'd like to convert from/to.
The failures in nvtabular come from the shape changes that were introduced. |
|
||
|
||
@pytest.mark.skipif(not cp, reason="Cupy not available") | ||
def test_pandas_cupy_combo(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there intended to be a call to the make_df
function in this test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Originally the idea was to show that the code in make_df was correct. But I will add the make_df call here also. And compare both DFs.
This PR combines increases readability of the make_df function while accounting for additional possibilities of input explicitly (cupy array being turned into a pandas df). We need this so that we do not hit the following error anymore when turning a GPU ndarray into a pandas DF:
TypeError: Implicit conversion to a NumPy array is not allowed. Please use
.get()to construct a NumPy array explicitly.