Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDFS storage support via datafusion-objectstore-hdfs #1273

Closed
wants to merge 6 commits into from

Conversation

yjshen
Copy link
Contributor

@yjshen yjshen commented Apr 10, 2023

Description

Related Issue(s)

Documentation

@github-actions github-actions bot added binding/rust Issues for the Rust crate documentation Improvements or additions to documentation rust labels Apr 10, 2023
@yjshen yjshen closed this Apr 10, 2023
@yjshen yjshen reopened this Apr 10, 2023
@yjshen yjshen changed the title Support reading from HDFS via datafusion-objectstore-hdfs HDFS storage support via datafusion-objectstore-hdfs Apr 10, 2023
@rtyler
Copy link
Member

rtyler commented Apr 10, 2023

Interestingly this does not build on my machine which does not have Java installed:

error: failed to run custom build command for `fs-hdfs3 v0.1.11`

Caused by:
  process didn't exit successfully: `/usr/home/tyler/source/github/delta-io/delta-rs/target/debug/build/fs-hdfs3-60a8f4d683c18085/build-script-build` (exit status: 101)
  --- stderr
  thread 'main' panicked at 'JAVA_HOME shell environment must be set: environment variable not found', /home/tyler/.cargo/registry/src/github.com-1ecc6299db9ec823/fs-hdfs3-0.1.11/build.rs:146:13
  note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
warning: build failed, waiting for other jobs to finish...

@yjshen
Copy link
Contributor Author

yjshen commented Apr 10, 2023

Hi @rtyler, do you mean you cannot compile delta-rs with hdfs enabled because of a lack of JVM env?

I think this is expected since fs-hdfs3 requires a JVM env setting to build its hdfs lib and ffi https://github.com/datafusion-contrib/fs-hdfs/blob/hadoop-3.1.4/build.rs#L24-L28

Copy link
Collaborator

@wjones127 wjones127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for submitting this PR. The changes look good, but before we merge could you enable the tests in CI? We'll need to enable the feature in:

cargo test -p deltalake --features integration_test,azure,s3,gcs,datafusion

And then also install the Java dependencies in that same job. Could you add that change to this PR?

pub mod hdfs_cli {
use std::env;
use std::path::PathBuf;
// use super::set_env_if_not_set;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we drop this?

Suggested change
// use super::set_env_if_not_set;

@iajoiner
Copy link
Contributor

@yjshen consented to me taking over this PR..so it has been replaced with #1279.

@yjshen yjshen closed this Apr 13, 2023
wjones127 pushed a commit that referenced this pull request Apr 14, 2023
# Description
The description of the main changes of your pull request
This is basically the continuation of #1273 (really thanks @yjshen for
your contributions! The work is all his..I'm just the guy who tries to
get it merged!). I addressed the comments from @wjones127.
# Related Issue(s)
<!---
For example:

- closes #300
--->

# Documentation

<!---
Share links to useful documentation
--->

---------

Co-authored-by: Yijie Shen <henry.yijieshen@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
binding/rust Issues for the Rust crate documentation Improvements or additions to documentation rust
Projects
None yet
Development

Successfully merging this pull request may close these issues.

hdfs support
4 participants