Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Move HDFS support to separate repo #9170

Closed
wants to merge 5 commits into from

Conversation

riversand963
Copy link
Contributor

@riversand963 riversand963 commented Nov 15, 2021

This PR moves HDFS support from RocksDB repo to a separate repo. The new (temporary?) repo
in this PR serves as an example before we finalize the decision on where and who to host hdfs support. At this point,
people can start from the example repo and fork.

Java/JNI is not included yet, and needs to be done later if necessary.

The goal is to include this commit in RocksDB 7.0 release.

Reference:
https://github.com/ajkr/dedupfs by @ajkr

Test plan:
Follow the instructions in https://github.com/riversand963/rocksdb-hdfs-env/blob/master/README.md. Build and run db_bench and db_stress.

make check

@riversand963 riversand963 linked an issue Nov 15, 2021 that may be closed by this pull request
@riversand963 riversand963 added the rocksdb-7.0 PRs with breaking API changes that need to land in the next major release, 7.0. label Nov 15, 2021
@riversand963
Copy link
Contributor Author

@adamretter, is it possible to move HDFS-related JNI code from main repo to https://github.com/riversand963/rocksdb-hdfs-env as well?

Copy link
Contributor

@mrambacher mrambacher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I agree with most of this change, is there a reason we should move HDFS under plugins/hdfs in the main repo? HDFS is one (of several) plugins that are currently in the source that should be moved into their own "untested" and "unsupported" place (like LUA and Rados and Cassandra).

For those tools that currently use them, it would make it simpler if we kept the code in one place. It would also simplify testing that we can do the right thing in building the plugins and getting the infrastructure working (regardless of if we ever actually RUN them).

Makefile Outdated
@@ -250,6 +250,7 @@ include $(ROCKSDB_PLUGIN_MKS)
ROCKSDB_PLUGIN_SOURCES = $(foreach plugin, $(ROCKSDB_PLUGINS), $(foreach source, $($(plugin)_SOURCES), plugin/$(plugin)/$(source)))
ROCKSDB_PLUGIN_HEADERS = $(foreach plugin, $(ROCKSDB_PLUGINS), $(foreach header, $($(plugin)_HEADERS), plugin/$(plugin)/$(header)))
PLATFORM_LDFLAGS += $(foreach plugin, $(ROCKSDB_PLUGINS), $($(plugin)_LDFLAGS))
CXXFLAGS += $(foreach plugin, $(ROCKSDB_PLUGINS), $($(plugin)_CXXFLAGS))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be PLATFORM_CXXFLAGS?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mind updating plugin/README.md to mention how plugins can bring their own compiler flags?

Copy link
Contributor

@ajkr ajkr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The example repo should register a name matching the whole URI like "hdfs://.*" instead of "hdfs". Other than that it worked great!

We can clarify the temporary repo is meant as a starting (forking) point for whoever wants to take ownership assuming that is the intent.

Makefile Outdated
@@ -250,6 +250,7 @@ include $(ROCKSDB_PLUGIN_MKS)
ROCKSDB_PLUGIN_SOURCES = $(foreach plugin, $(ROCKSDB_PLUGINS), $(foreach source, $($(plugin)_SOURCES), plugin/$(plugin)/$(source)))
ROCKSDB_PLUGIN_HEADERS = $(foreach plugin, $(ROCKSDB_PLUGINS), $(foreach header, $($(plugin)_HEADERS), plugin/$(plugin)/$(header)))
PLATFORM_LDFLAGS += $(foreach plugin, $(ROCKSDB_PLUGINS), $($(plugin)_LDFLAGS))
CXXFLAGS += $(foreach plugin, $(ROCKSDB_PLUGINS), $($(plugin)_CXXFLAGS))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mind updating plugin/README.md to mention how plugins can bring their own compiler flags?

@riversand963 riversand963 removed the rocksdb-7.0 PRs with breaking API changes that need to land in the next major release, 7.0. label Nov 16, 2021
@riversand963
Copy link
Contributor Author

@mrambacher
Just to clarify, we are not simply moving hdfs to plugins/hdfs in the main repo. We are removing hdfs support from main repo and moving it to a separate repo. Currently, it is hosted temporarily at https://github.com/riversand963/rocksdb-hdfs-env.git, and we can finalize the location or anybody can fork it. Future contribution to hdfs-specific code should not go to main repo. Only if somebody wants to build RocksDB with hdfs will they clone the repo to plugins/hdfs. We probably should do the same for all special Env/FS, e.g. rados, etc.

After removing the hdfs-specific code from main repo and ensuring there is a workaround solution as described in this PR, I currently do not have strong opinion on next step w.r.t #7949, and I don't think this PR should be blocked on a WIP feature. Even after HdfsEnv or HdfsFS and/or Env becomes Customizable, I think the approach in this PR should still work, and it does not prevent the development of #7949.

@ajkr ajkr added the rocksdb-7.0 PRs with breaking API changes that need to land in the next major release, 7.0. label Nov 19, 2021
@riversand963 riversand963 changed the title [RFC][WIP] Remove HDFS support from main repo and migrate to separate repo [RFC][WIP] Move HDFS support to separate repo Nov 24, 2021
Copy link
Contributor

@ajkr ajkr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like Java bindings need to be deleted too.

@adamretter
Copy link
Collaborator

@ajkr Yes indeed, it is on my TODO list (nearish the top)

@riversand963
Copy link
Contributor Author

Hi @adamretter,
Discussed with @ajkr offline. Is it possible to expose a createEnvFromUri() public Java API? If so, is it possible to get rid of the Java binding code for HDFS?

@darionyaphet
Copy link
Contributor

Maybe HDFS is not the only option. Would support be more optional in the future?

@adamretter
Copy link
Collaborator

Is it possible to expose a createEnvFromUri() public Java API?

@riversand963 As part of HDFS support, or as something else in RocksDB?

@mrambacher
Copy link
Contributor

The Customizable infrastructure (e.g. XXX::CreateFromString) will work for all of the extension classes (Env is still pending). I believe that is the interface that needs to be added to Java, for all of the Customizable classes

Having said that, I am not sure I understand how HDFS (or other plugins) can be accessed in Java unless they are built into the initial shared library, which means we will either be distributing a lot of Maven packages for the possible combinations (2 plugins means 4 packages, 3 means 8, etc) or have a means of dynamically adding them to the Java project.

@adamretter
Copy link
Collaborator

@mrambacher The approach taken in https://github.com/riversand963/rocksdb-hdfs-env/ is to build everything from source, so I doubt we will distribute those binaries. Otherwise with that approach we would have one complete binary of RocksDB for each plugin - and then what happens if you want two plugins?!?

@riversand963
Copy link
Contributor Author

@adamretter correct, I don't think we will distribute binaries for different combinations of RocksDB + (HDFS + RADOS).

As part of HDFS support, or as something else in RocksDB

Something else in RocksDB as a public Java API.

Copy link
Contributor

@ajkr ajkr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just need to remove Java binding.

riversand963 added a commit to riversand963/rocksdb that referenced this pull request Dec 1, 2021
Summary:
Not a CMake expert, and the current CMake build support added by this PR is
unlikely the best way of doing it. Sending out the PR to demonstrate it
can work.

Test Plan:
Will need to update https://github.com/ajkr/dedupfs with CMake build.
Also, PR facebook#9170 and PR facebook#9206 both include CMake support for their
plugins, and can be used as a proof of concept.
@riversand963
Copy link
Contributor Author

Thanks @ajkr for the review! Yeah, we will merge this after Java bindings for HDFS are deleted from main repo and we have an alternative.

@riversand963
Copy link
Contributor Author

Maybe HDFS is not the only option. Would support be more optional in the future?

@darionyaphet Not sure if I understand. Do you mean "support for more file systems"?

@darionyaphet
Copy link
Contributor

Maybe HDFS is not the only option. Would support be more optional in the future?

@darionyaphet Not sure if I understand. Do you mean "support for more file systems"?

yep such as alluxio

@adamretter
Copy link
Collaborator

I am happy with what you are proposing. My concern is, if somebody wants two plugins, then your design is impossible... Or did I misunderstand?

facebook-github-bot pushed a commit that referenced this pull request Dec 1, 2021
Summary:
Not a CMake expert, and the current CMake build support added by this PR is
unlikely the best way of doing it. Sending out the PR to demonstrate it
can work.

Pull Request resolved: #9214

Test Plan:
Will need to update https://github.com/ajkr/dedupfs with CMake build.
Also, PR #9170 and PR #9206 both include CMake support for their
plugins, and can be used as a proof of concept.

Reviewed By: ajkr

Differential Revision: D32738273

Pulled By: riversand963

fbshipit-source-id: da87fb4377c716bbbd577a69763b48d22483f845
@riversand963
Copy link
Contributor Author

riversand963 commented Dec 1, 2021

@darionyaphet no, we do not have such plans at the moment.
You are welcome to try RocksDB on other file systems. If you are committed to maintaining one such FS/Env, you can consider adding it to PLUGINS.md.

@riversand963
Copy link
Contributor Author

riversand963 commented Dec 1, 2021

@adamretter

I am happy with what you are proposing. My concern is, if somebody wants two plugins, then your design is impossible... Or did I misunderstand?

I assume you are replying to my prior comment.
I am not sure I follow. Can you elaborate on why it's not possible to have two plugins? Is this something specific in Java? In this PR, I can build RocksDB with both dedupfs and hdfs and run db_stress with either of them.

@ajkr
Copy link
Contributor

ajkr commented Dec 1, 2021

@adamretter

I am happy with what you are proposing. My concern is, if somebody wants two plugins, then your design is impossible... Or did I misunderstand?

I assume you are replying to my prior comment. I am not sure I follow. Can you elaborate on why it's not possible to have two plugins? Is this something specific in Java? In this PR, I can build RocksDB with both dedupfs and hdfs and run db_stress with either of them.

Right it is possible for users who build from source to include as many plugins as they want. IMO the Java release should include zero plugins. I suspect the Java release already does not include a functioning HdfsEnv since (before this PR) that required compiling with USE_HDFS=1 and linking -lhdfs.

@riversand963
Copy link
Contributor Author

@ajkr correct, current Java release does not include HDFS.

@riversand963 riversand963 changed the title [RFC][WIP] Move HDFS support to separate repo [RFC] Move HDFS support to separate repo Dec 2, 2021
@riversand963 riversand963 force-pushed the hdfs-as-plugin branch 2 times, most recently from 404f988 to b1a4618 Compare January 5, 2022 00:55
@facebook-github-bot
Copy link
Contributor

@riversand963 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@riversand963 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@riversand963 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@riversand963
Copy link
Contributor Author

Hi @alanpaxton, I have merged this PR to main, would it be possible for you to rebase riversand963#4 onto facebook/main? Thanks!
cc @adamretter

@alanpaxton
Copy link
Contributor

Done.

@riversand963 riversand963 deleted the hdfs-as-plugin branch January 25, 2022 15:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed rocksdb-7.0 PRs with breaking API changes that need to land in the next major release, 7.0.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Move HDFS support from RocksDB main repo to plugin
7 participants