-
-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[jvm-packages] Publishing xgboost4j and others to Maven Central #1807
Comments
for platform, since XGBoost does not work for 32bit systems we only need to care 64 linux/win/osx |
my personally preferred way to publish to maven is contain everything in a single jar http://central.maven.org/maven2/org/xerial/snappy/snappy-java/1.1.2.6/ you can download snappy-java-1.1.2.6.jar and look at the structure of their native libs |
I'll have a look, thanks. What I don't get is how the build process is organized in this case: it may mean that they have some internal repository with binaries, then they pull them from there during the building process, and only after that publish the jar. Having several jars may be an advantage because there will be no need for that: we can use the maven central as such repository. But I'll need to have a closer look. |
I am trying the multiple-modules approach - it seems more natural to me and, unlike the one-module-has-them-all approach, I have ideas how to implement it. The way I think it could work is the following. Suppose there are 3 persons A, with a linux machine, B, with windows, and C with a mac. When the next version is ready to be released to maven, A takes the current version of xgboost4j (e.g. 0.7-SNAPSHOT), and using the maven-release plugin does this:
After this is done, B and C can checkout the 0.7 version from git, and then build and publish only the native modules. Of course, it is possible that B or C do the main release and others just publish the binaries. I'm experimenting in my fork here: https://github.com/alexeygrigorev/xgboost What do you think? |
They have pre-built native libraries https://github.com/xerial/snappy-java/tree/7650aa29fb52c3ba467e9c906cf22a3dab536861/src/main/resources/org/xerial/snappy/native and there will be only one library in central maven |
OK so it means they store the binaries in git? I am not sure it's a good idea. Anyways, my experiments with multi-module build seem to have worked: I managed to deploy the binaries and the jars to sonatype's snapshot nexus. Here it is: https://oss.sonatype.org/content/repositories/snapshots/ml/dmlc/xgboost/ I only have linux and windows machines, so I tried only these two. Right now using the snapshot versions should be possible this way:
This should automatically download the appropriate native version depending on the platform. For linux it seems to work well, but for windows it needs extra libraries - so I may need to try this with a clean virtual machine with only java and maven installed and see if it works. Also, I needed to turn off building the jar-with-dependencies - sonatype's nexus doesn't allow uploading large files. These jars can be built with a special profile. Once we agree on everything, then I can create a pull request and we can publish XGBoost to Sonatype Release repository, which synchronizes with maven central. |
It does not say that we need to store binaries in git..The reason they have prebuilt native libraries saved there is that they plan to support many pkatforms including those with hard-to-use toolchains.... Our goal is only to support 64-bits linux/mac/win. We only need to do what we are doing: compile native libs-> copy to resource dir ->build jar I still didn't see why uploading many jars to central maven repo is necessary... |
It may not be necessary but I don't know how to organize the build process without it. As I wrote earlier, in my opinion the limitation of one-jar-rules-them-all approach is that we first need to build the code for each target platform, store the binaries somewhere, and then during the publication to maven pull the binaries from there and include in the final jar. I don't know how to do it. When it comes to multiple modules, it is still not ideal, but solves this problem, and the build process is organized as I wrote earlier. So I may suggest to follow the approach I propose and have the binaries in central sooner rather than later, and then maybe someone with better knowledge of maven can modify it and do it better. |
Why storing binaries somewhere? how about put all native libraries in local disk (resource directory), include them in the jar when building and finally publish jar to maven? |
Ok, so how would you do this? Somebody builds binaries for windows and then sends them over email to the person with linux? |
It's another question i do not understand... Why we have to involve more than one persons for cross building? It's hard to imagine a program release process requires two persons.... In rocksdb, they use vagrant to cross build ubuntu and mac...xgboost does mot have those system calls or something else, these two platforms can share the same native lib file in most of cases.. For windows, i am not a expert in win programming...even vargrant does not work, a manual within-VM will achieve the same goal |
The next question to discuss is... can we skip windows when releasing to maven? The main reason is that we do not have enough(zero?) test on xgboost4j under windows... |
Well we probably don't need to involve more than one user, but I'm not an expert in vargrant either, sorry. But what I suggest does require three users:
This is for publishing the snapshot version, a release build would be a bit more complicated, but I outlined it above. As I am not familiar with vargrant and other virtualization tools, I don't know how to organize it better. Let me know if my proposal is interesting for you, otherwise I'm putting my current efforts on hold. |
I will talk with mxnet guys to understand if there is any other reason for them to have many jars in mvn central |
I successfully made a jar with a Windows dll, a Mac OSX macports dylib and a Linux so, and store that in our artifactory which works pretty well. Apart from when someone who uses brew tries to use the macports dylib and it gives an odd library not found error. |
@alexeygrigorev I am looking forward to get a windows OS xgboost-spark JAR from maven central repository or others, i offen code in IntelliJ IDEA tool windows OS,then run project in Linux production system, because it is convenient debugging. In my experience, it is easy to compile xgboost in Linux OS, but in windows OS ,i have never been successful. So if you have done it , please tell me ,thank you very much. |
I have the same problem with @frank111 . |
Please publish xgboost to Maven with bundled native libraries for all architectures. |
Even for non-x86 architecture? |
@CodingCat Yes, for all architectures that XGBoost4J supports. Please bundle native libraries into Maven package and load them at runtime depending on what architecture app is running. Or at least allow to select native architecture via linking with different Maven packages at app's build time (not your library build time!), like DeepLearning4J does it. Anyway, building from source just to select multithreading backend should not be required. And Maven packages should be enough for usage of the library. |
I would also appreciate it very much if a least the major releases of XGBoost4J would be available via Maven. I'm also using DeepLearning4J, which is very comfortable to use compared to XGBoost4J. In the meanwhile dl4j is even offering nightly builds on maven. In my opinion the missing of reliable builds of XGBoost4J is a major bummer for more serious use cases for this great library. Especially on Windows building XGBoost4J is an heavy adventure ;) |
@mjakobus Yes, I have same feelings: XGBoost4J missing regular major releases and especially Maven-released packages with native backend selected at runtime. |
How do people use this in production if it is not in maven central? Manually create the JAR files? |
At Criteo we build XGBoost JARs on Travis/Appveyor. In theory, the same scripts can be reused to publish the official JARs for XGBoost, but I didn't have the time to do that. |
We just manually put them to our nexus |
so the pom works to generate the artifact via standard maven jar building commands? And has this been tested in a linux environment? |
In our case - yes, and we do it only for linux machines |
I've built a multi-jar with Linux, Windows and Mac libraries, and put it in an artifactory. Works fine from there. |
At BlaBlaCar we build it then publish to an internal nexus. Then apps fetch from the nexus. It is not an multi-jar, thus we have Linux and Mac libraries separately. Apps then get the right dependency, e.g., using a |
Unfortunately my version isn't available, but the logic in XGBoost4J causes it to load the correct binary based on the platform, so all you need to do is unzip each jar, copy the dll, so and dylib into the same resources directory and rejar it. If you require multiple linux versions, this approach won't work, as the loading logic isn't complicated enough (similarly it fails if you have multiple so files for different platforms e.g. Linux & Solaris). |
@edumucelli you can assemble a multi JAR by running It is built for an admittedly ancient CentOS6, so should work on CentOS7 as well as more recent Linux distributions. |
@superbobry, that is great! Thank you for sharing it! |
@alexeygrigorev @CodingCat @edumucelli what was the outcome of this? |
There is, yes. Right now it is possible to do mvn publish and it will deploy it to your local nexus repository |
@Obarros I am using @superbobry's multi JAR on Debian-based containers in production. |
for anyone who wants to use pre-built version of xgboost, please check README file in https://github.com/dmlc/xgboost/tree/master/jvm-packages, we have published artifacts to maven central |
@CodingCat, @edumucelli Thanks! |
@CodingCat could you please also push the windows artifacts as well? The published artifact only contains the linux version. thanks |
@bluelu, it contains both Linux and MacOS. |
@edumucelli this has been discussed in #3276. tl;dr @CodingCat decided not to support Windows for the Maven Central JARs. We have some prebuilt JARs over at criteo-forks/xgboost-jars which come with a Windows DLL, though. |
Hi, it's fine for me. I have build my own version, however it would help others certainly if it would be readily available without having to compile yourself. |
@superbobry thanks for the link to that thread. That's not an issue, I was just complementing @bluelu's comment about linux-only jar, which in fact has a MacOS dlyb too. |
Many users would like to see xgboost4j published to maven central (see #935)
I think we can follow the approach similar to MTJ (https://github.com/fommil/matrix-toolkits-java), which depends on netlib binaries - and this is, probably, what @Javelinjs suggested in his comment about mxnet
In essence, the idea is to have separate JAR files for each platform and publish them all to Maven Central. Then we add all of them as dependencies to xgboost4j and during the execution time decide which one to load.
We can also have a look at jni-loader (https://github.com/mrburrito/jni-loader)
This is how it looks for MTJ:
We could start from selecting one platform, e.g. 64bit linux, and see how it goes.
The text was updated successfully, but these errors were encountered: