Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: libhdfs3 not getting picked up correctly when using conda #15019

Closed
h-vetinari opened this issue Dec 18, 2022 · 4 comments
Closed

BUG: libhdfs3 not getting picked up correctly when using conda #15019

h-vetinari opened this issue Dec 18, 2022 · 4 comments

Comments

@h-vetinari
Copy link
Contributor

h-vetinari commented Dec 18, 2022

Describe the bug, including details regarding any error messages, version, and platform.

#14832 switched on the test suite within the conda-builds, which itself synced with conda-forge/arrow-cpp-feedstock#875

In the process of doing so, I removed

# doesn't get picked up correctly
# - libhdfs3

because the tests always showed: SKIPPED [24] test_hdfs.py:48: No libhdfs available on system

Even with a patch as follows:

diff --git a/cpp/src/arrow/io/hdfs_internal.cc b/cpp/src/arrow/io/hdfs_internal.cc
index 4592392b8..9f8f70389 100644
--- a/cpp/src/arrow/io/hdfs_internal.cc
+++ b/cpp/src/arrow/io/hdfs_internal.cc
@@ -144,9 +144,9 @@ Result<std::vector<PlatformFilename>> get_potential_libhdfs_paths() {
 #ifdef _WIN32
   file_name = "hdfs.dll";
 #elif __APPLE__
-  file_name = "libhdfs.dylib";
+  file_name = "libhdfs3.dylib";
 #else
-  file_name = "libhdfs.so";
+  file_name = "libhdfs3.so";
 #endif

   // Common paths
@@ -155,6 +155,8 @@ Result<std::vector<PlatformFilename>> get_potential_libhdfs_paths() {
   // Path from environment variable
   AppendEnvVarFilename("HADOOP_HOME", "lib/native", &search_paths);
   AppendEnvVarFilename("ARROW_LIBHDFS_DIR", &search_paths);
+  AppendEnvVarFilename("CONDA_PREFIX", "lib", &search_paths);
+  AppendEnvVarFilename("LIBRARY_LIB", &search_paths);

   // All paths with file name
   for (const auto& path : search_paths) {

this remained the case. I thought this might be necessary because in the conda-forge world, the name of the libhdfs binary contains the "3", but there seem to be other issues at play here as well.

To fix this issue, the line quoted above should be uncommented, and the conda tests should not show SKIPPED [24] test_hdfs.py:48: No libhdfs available on system anymore (at least on unix).

Component(s)

C++, Continuous Integration

@kou
Copy link
Member

kou commented Dec 19, 2022

It seems that libhdfs.so and libhdfs3.so are different project.
libhdfs.so is a Java (JNI) library but libhdfs3.so is not.
https://github.com/martindurant/libhdfs3-downstream/tree/master/libhdfs3

I think that we need to implement libhdfs3.so based HDFS backend instead of reusing the current libhdfs.so based implementation. (I think that this is a feature request not a bug.)

@jorisvandenbossche
Copy link
Member

The Arrow HDFS implementation is based on the JNI libhdfs, so it is expected that it doesn't work with libhdfs3. And since libhdfs doesn't seem to be packages by conda-forge, I don't think there is a way to run the hdfs tests in the conda-forge build purely based on conda packages (our own tests install the JNI library manually on top of the conda env, see eg ci/docker/conda-python-hdfs.dockerfile)

We have had integration with libhdfs3 as well in the past (and you could switch between both drivers), but this was removed almost 3 years ago (#6432), because the libhdfs3 project is unmaintained. Also the dask filesystem wrapper using libhdfs3 is archived (https://github.com/dask/hdfs3).

Unless the libhdfs3 project would be revived, I don't think we should currently consider adding support for it again.

@h-vetinari
Copy link
Contributor Author

Cool, thanks for the input. I wasn't aware of the split between libhdfs / libhdfs3; in this case I think it might be worthwhile medium term to package libhdfs in conda-forge.

@kou
Copy link
Member

kou commented Dec 21, 2022

but this was removed almost 3 years ago (#6432)

Oh, the author of the pull request is me... I didn't remember it...

I close this because we don't support libhdfs3.

@kou kou closed this as not planned Won't fix, can't repro, duplicate, stale Dec 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants