-
Notifications
You must be signed in to change notification settings - Fork 918
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make ai.rapids.cudf.HostMemoryBuffer#copyFromStream public. #17179
Make ai.rapids.cudf.HostMemoryBuffer#copyFromStream public. #17179
Conversation
Signed-off-by: liurenjie1024 <liurenjie2008@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general the utilities look interesting. But I don't see a lot of value in adding them to CUDF unless there is something in CUDF that is going to use them. I can see a lot of places that Arms might be used. But arguably it should not be a part of CUDF until we have code that uses it. We don't want CUDF to become guava or some other utilities library. We want CUDF to provide APIs for processing dataframe data on the GPU, and it is not clear how these APIs facilitate that.
*/ | ||
public static <R extends AutoCloseable> void close(Iterator<R> resources) { | ||
Throwable t = null; | ||
while (resources.hasNext()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First off this is not closing the iterator. This is closing all of the values that the iterator holds. So at a minimum I would like it if we changed the name to closeAll
, or something like that.
Second an Iterator has no guarantee that you are looping over actual values. It can lazily generate values or things like that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed, I have changed method name to closeAll
.
/** | ||
* This method safes closes the resources. | ||
*/ | ||
public static <R extends AutoCloseable> void close(Iterable<R> resources) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has the same problems as the Iterator API, but worse. An Iterable
is typically a Collection
or something like that. If that collection is AutoClosable then close
is now ambiguous. Do we want to close the collection itself or do we want to close the things in that collection. This needs to be called closeAll
at least. I would also prefer it if we switched this to Collection
instead of Iterable
, unless there is a specific use case where we want an Iterable that is not a Collection.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed, renamed to closeAll
and use Collection
.
@@ -246,7 +246,7 @@ public final void copyFromHostBuffer(long destOffset, HostMemoryBuffer srcData, | |||
* @param in input stream to copy bytes from | |||
* @param byteLength number of bytes to copy | |||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update the docs so that it is clear that an EOFException is thrown if the InputStream does not include enough bytes to do the copy. Also it would really be nice if we could include a message with the EOFException to indicate what happened.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
Hi, @revans2 As I've mentioned in description, it's part of NVIDIA/spark-rapids-jni#2532 , which follows discussion with @jlowe to split that larger one into smaller pieces to be easier to review, please refer to NVIDIA/spark-rapids-jni#2532 (review) for more details. |
@liurenjie1024 I get that these utilities are here so that spark-rapids-jni can use them. My problem is that CUDF is a library for doing data frame processing on the GPU. It is not a library for providing general purpose java utilities. If the CUDF java code actually used these APIs, then I could justify it in my mind. But that is not the case. These are here only so that spark-rapids-jni can use them. Even if CUDF used these APIs I would argue that they should be package private unless they are exposed in the CUDF public APIs. This is because the point of CUDF is to provide data frame processing on the GPU. It is not here to provide general purpose java utilities. If these are for spark-rapids-jni, then lets put them in spark-rapids-jni. |
I get your point. I'll put these utilities nito spark-rapids-jni. |
Moved to NVIDIA/spark-rapids-jni#2542 |
ai.rapids.cudf.HostMemoryBuffer#copyFromStream
public.
ai.rapids.cudf.HostMemoryBuffer#copyFromStream
public.
build |
/ok to test |
2 similar comments
/ok to test |
/ok to test |
/ok to test |
build |
/ok to test |
1 similar comment
/ok to test |
/merge |
Description
This is the first pr of a larger one to introduce a new serialization format. It make
ai.rapids.cudf.HostMemoryBuffer#copyFromStream
public. For more background, see NVIDIA/spark-rapids-jni#2496Checklist