Skip to content

Conversation

@luoyuxia
Copy link

@luoyuxia luoyuxia commented Nov 5, 2025

What's Changed

New Features

Added Parquet Writer Support: Introduced ParquetWriter class to write Arrow VectorSchemaRoot to Parquet files via JNI

Implementation

C++ Side:

  • Implemented JavaOutputStreamAdapter to wrap Java OutputStream as Arrow OutputStream
  • Added JNI methods: nativeCreateParquetWriter, nativeWriteParquetBatch, nativeCloseParquetWriter
  • Implemented property builders to convert Java properties to C++ Parquet writer properties

This contains breaking changes.

Closes #735

@luoyuxia luoyuxia force-pushed the support-write-arrow-batch branch from b815f6d to 99fea13 Compare November 10, 2025 06:31
@luoyuxia luoyuxia force-pushed the support-write-arrow-batch branch from 99fea13 to 3cc6e16 Compare November 10, 2025 06:32
@luoyuxia luoyuxia changed the title support write arrow record batch GH-735: Support write arrow record batch Nov 10, 2025
@github-actions
Copy link

Thank you for opening a pull request!

Please label the PR with one or more of:

  • bug-fix
  • chore
  • dependencies
  • documentation
  • enhancement

Also, add the 'breaking-change' label if appropriate.

See CONTRIBUTING.md for details.

@luoyuxia luoyuxia marked this pull request as ready for review November 10, 2025 06:48
@luoyuxia
Copy link
Author

@lidavidm Hi, could you please help review this pr when you are free?

@V-Fenil
Copy link

V-Fenil commented Nov 12, 2025

@lidavidm Hi, could you please help review this pr when you are free?

Hi @luoyuxia I really appreciate that you have implemented this, however I was trying to test this implementation for ParquetWriter
I was able to resolve all errors by skipping few test case. But seems like your current open PR does not have .dll and .so files due to which it is giving me this error in java

exception: error loading native lib arrow_dataset_jni/x86_64/arrow_dataset_jni.dll FileNotFoundException

Is there any other way I can test this? or I need to wait until it gets merged in main branch?

@luoyuxia
Copy link
Author

@V-Fenil Hi, thanks for your interest. I already built .so(for linux), .dylib(for mac). But I don't have windows env, so I can't provide .dll for you. To verify this pr, you'll need to build from source, see https://github.com/apache/arrow-java?tab=readme-ov-file#building-from-source. That's also what I did to verify my pr.

@V-Fenil
Copy link

V-Fenil commented Nov 13, 2025

@V-Fenil Hi, thanks for your interest. I already built .so(for linux), .dylib(for mac). But I don't have windows env, so I can't provide .dll for you. To verify this pr, you'll need to build from source, see https://github.com/apache/arrow-java?tab=readme-ov-file#building-from-source. That's also what I did to verify my pr.

Hi @luoyuxia I'm testing your PR on linux. Could you share the built libarrow_dataset_jni.so file? I can build java but need the native library. (more specific my build was success but I can't find .so file)

Total build time was 49 mins
And Arrow Java C Data Interface & Arrow Java Dataset was only 45 sec each!! So there was no C++ compilation I guess, if would be better if you share direct file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support parquet write from Arrow record batch

2 participants