ARROW-16913: [Java] Implement ArrowArrayStream#13465
Conversation
java/c/src/main/cpp/jni_wrapper.cc
Outdated
| ThrowPendingException(message); | ||
| } | ||
| jclass global_class = (jclass)env->NewGlobalRef(local_class); | ||
| if (!local_class) { |
There was a problem hiding this comment.
Is this a mistake?
| if (!local_class) { | |
| if (!global_class) { |
java/c/src/main/cpp/jni_wrapper.cc
Outdated
| const int err_code = env->CallIntMethod(private_data->j_private_data_, | ||
| kPrivateDataGetSchemaMethod, out_addr); | ||
| if (env->ExceptionCheck()) { | ||
| env->ExceptionDescribe(); |
There was a problem hiding this comment.
If there's an exception, should it perhaps participate in last_error_?
There was a problem hiding this comment.
Normally the JNI side sets the last error, the check here is just a last-resort safeguard. I suppose this can be refactored though: copy the Java-side error to the C++ side after get_next/get_stream, and get_last_error only has to return the C++-side error; then get_next/get_stream can also update last_error_ if it ends up catching a stray error.
java/c/src/main/cpp/jni_wrapper.cc
Outdated
| if (env->ExceptionCheck()) { | ||
| env->ExceptionDescribe(); | ||
| env->ExceptionClear(); | ||
| ThrowPendingException("Error calling close of private data"); |
There was a problem hiding this comment.
Is this right? The release callback could be called from any context, such as a Python thread or R interpreter. In those contexts, a C++ exception would probably crash the process (or silently exit the thread)?
There was a problem hiding this comment.
ah, you're right. The existing handler has this issue too. I'll remove the throw. (Actually here I suppose we should do our best to free resources in C++/Java regardless.)
java/c/src/main/cpp/jni_wrapper.cc
Outdated
| JNIEnvGuard guard(private_data->vm_); | ||
| JNIEnv* env = guard.env(); | ||
|
|
||
| const long out_addr = static_cast<long>(reinterpret_cast<uintptr_t>(out)); |
There was a problem hiding this comment.
I suppose this doesn't work on 64-bit Windows? long is 32 bits there...
There was a problem hiding this comment.
Also, according to the JNI spec, a jlong is always 64 bits, so perhaps we should use jlong or simply int64_t here?
| * @param stream C stream interface struct to import. | ||
| * @return Imported reader | ||
| */ | ||
| public static ArrowReader importStream(BufferAllocator allocator, ArrowArrayStream stream) { |
There was a problem hiding this comment.
Is there a reason for the naming discrepancy (importStream vs. exportArrayStream)?
| static class ExportedArrayStreamPrivateData implements PrivateData { | ||
| final BufferAllocator allocator; | ||
| final ArrowReader reader; | ||
| int nextDictionary; |
There was a problem hiding this comment.
This member doesn't seem used anymore, or am I missing something?
There was a problem hiding this comment.
Ah it's not used. I missed this when backing out a change.
pitrou
left a comment
There was a problem hiding this comment.
+1. For the record, did you try to use this to communicate with e.g. PyArrow?
|
@lwhite1 Would you like to take a look? |
I have not yet, I need to give this a try: https://arrow.apache.org/docs/dev/python/integration/python_java.html#java-to-python-communication-using-the-c-data-interface and actually, I'll extend the doc page there as well. |
|
…wow, whatever GitHub did to their UI is rather frustrating. |
|
Hmm, there's a possible minor bug between PyArrow/C++/Java: Python can keep a reference to the reader until interpreter shutdown (at which point the JVM has been shut down), and then collects the reader. This frees the Changes needed:
|
Can Python perhaps release that reference once close() is called? |
|
Well, the Python-side reference is the Python reader object itself. But close() should be wired up to call the new RecordBatchReader::Close() so we can at least explicitly call the release callback at a suitable time. |
|
Though the Java improvements are welcome as well. We should probably try to do both. |
|
… |
pitrou
left a comment
There was a problem hiding this comment.
Just a question, this is great otherwise.
|
@amol- Do you want to take a look at the doc additions? |
Implements ArrowArrayStream for Java. The equivalent Java-side interface chosen is ArrowReader. Also: - Fixes a couple of JDK9 compatibility issues I ran into. I _think_ these will not normally affect people except during development (I think because I was mixing IntelliJ and Maven). - Manually clang-format the C++ code. Clean up some things to match Arrow convention and remove some unused declarations. - Extends the DictionaryProvider interface. This is a potentially breaking change; we could make the method default (and raise an exception) instead. Authored-by: David Li <li.davidm96@gmail.com> Signed-off-by: Alessandro Molina <amol@turbogears.org>
Implements ArrowArrayStream for Java. The equivalent Java-side interface chosen is ArrowReader. Also: - Fixes a couple of JDK9 compatibility issues I ran into. I _think_ these will not normally affect people except during development (I think because I was mixing IntelliJ and Maven). - Manually clang-format the C++ code. Clean up some things to match Arrow convention and remove some unused declarations. - Extends the DictionaryProvider interface. This is a potentially breaking change; we could make the method default (and raise an exception) instead. Authored-by: David Li <li.davidm96@gmail.com> Signed-off-by: Alessandro Molina <amol@turbogears.org>
Implements ArrowArrayStream for Java. The equivalent Java-side interface chosen is ArrowReader.
Also: