[jvm-packages] Publishing xgboost4j and others to Maven Central #1807

alexeygrigorev · 2016-11-23T19:37:54Z

Many users would like to see xgboost4j published to maven central (see #935)

I think we can follow the approach similar to MTJ (https://github.com/fommil/matrix-toolkits-java), which depends on netlib binaries - and this is, probably, what @Javelinjs suggested in his comment about mxnet

In essence, the idea is to have separate JAR files for each platform and publish them all to Maven Central. Then we add all of them as dependencies to xgboost4j and during the execution time decide which one to load.

We can also have a look at jni-loader (https://github.com/mrburrito/jni-loader)

This is how it looks for MTJ:

We could start from selecting one platform, e.g. 64bit linux, and see how it goes.

CodingCat · 2016-11-23T19:48:14Z

for platform, since XGBoost does not work for 32bit systems

we only need to care 64 linux/win/osx

CodingCat · 2016-11-23T19:51:47Z

my personally preferred way to publish to maven is contain everything in a single jar

http://central.maven.org/maven2/org/xerial/snappy/snappy-java/1.1.2.6/

you can download snappy-java-1.1.2.6.jar and look at the structure of their native libs

alexeygrigorev · 2016-11-23T20:59:23Z

I'll have a look, thanks. What I don't get is how the build process is organized in this case: it may mean that they have some internal repository with binaries, then they pull them from there during the building process, and only after that publish the jar.

Having several jars may be an advantage because there will be no need for that: we can use the maven central as such repository.

But I'll need to have a closer look.

alexeygrigorev · 2016-11-24T15:58:35Z

I am trying the multiple-modules approach - it seems more natural to me and, unlike the one-module-has-them-all approach, I have ideas how to implement it.

The way I think it could work is the following. Suppose there are 3 persons A, with a linux machine, B, with windows, and C with a mac.

When the next version is ready to be released to maven, A takes the current version of xgboost4j (e.g. 0.7-SNAPSHOT), and using the maven-release plugin does this:

updates the version to 0.7
releases the linux native lib along with other java modules to maven
commits the change in version to git
updates the version to 0.8-SNAPSHOT, commits the change again

After this is done, B and C can checkout the 0.7 version from git, and then build and publish only the native modules.

Of course, it is possible that B or C do the main release and others just publish the binaries.

I'm experimenting in my fork here: https://github.com/alexeygrigorev/xgboost

What do you think?

CodingCat · 2016-12-04T19:10:33Z

I'll have a look, thanks. What I don't get is how the build process is organized in this case: it may mean that they have some internal repository with binaries, then they pull them from there during the building process, and only after that publish the jar.

Having several jars may be an advantage because there will be no need for that: we can use the maven central as such repository.

They have pre-built native libraries https://github.com/xerial/snappy-java/tree/7650aa29fb52c3ba467e9c906cf22a3dab536861/src/main/resources/org/xerial/snappy/native

and

load them with https://github.com/xerial/snappy-java/blob/7650aa29fb52c3ba467e9c906cf22a3dab536861/src/main/java/org/xerial/snappy/SnappyLoader.java

there will be only one library in central maven

alexeygrigorev · 2016-12-04T20:24:01Z

OK so it means they store the binaries in git? I am not sure it's a good idea.

Anyways, my experiments with multi-module build seem to have worked: I managed to deploy the binaries and the jars to sonatype's snapshot nexus. Here it is: https://oss.sonatype.org/content/repositories/snapshots/ml/dmlc/xgboost/

I only have linux and windows machines, so I tried only these two.

Right now using the snapshot versions should be possible this way:

<project>
...
  <repositories>
    <repository>
      <id>sonatype-shapshot</id>
      <name>Sonatype Snapshot Repository</name>
      <url>https://oss.sonatype.org/content/repositories/snapshots/</url>
    </repository>
  </repositories>
  <dependencies>
    <dependency>
      <groupId>ml.dmlc.xgboost</groupId>
      <artifactId>xgboost4j</artifactId>
      <version>0.7-SNAPSHOT</version>
    </dependency>
    ...
  </dependencies>
</project>

This should automatically download the appropriate native version depending on the platform.

For linux it seems to work well, but for windows it needs extra libraries - so I may need to try this with a clean virtual machine with only java and maven installed and see if it works.

Also, I needed to turn off building the jar-with-dependencies - sonatype's nexus doesn't allow uploading large files. These jars can be built with a special profile.

Once we agree on everything, then I can create a pull request and we can publish XGBoost to Sonatype Release repository, which synchronizes with maven central.

CodingCat · 2016-12-04T20:45:14Z

It does not say that we need to store binaries in git..The reason they have prebuilt native libraries saved there is that they plan to support many pkatforms including those with hard-to-use toolchains....

Our goal is only to support 64-bits linux/mac/win. We only need to do what we are doing: compile native libs-> copy to resource dir ->build jar

I still didn't see why uploading many jars to central maven repo is necessary...

alexeygrigorev · 2016-12-04T20:54:06Z

It may not be necessary but I don't know how to organize the build process without it.

As I wrote earlier, in my opinion the limitation of one-jar-rules-them-all approach is that we first need to build the code for each target platform, store the binaries somewhere, and then during the publication to maven pull the binaries from there and include in the final jar. I don't know how to do it.

When it comes to multiple modules, it is still not ideal, but solves this problem, and the build process is organized as I wrote earlier.

So I may suggest to follow the approach I propose and have the binaries in central sooner rather than later, and then maybe someone with better knowledge of maven can modify it and do it better.

CodingCat · 2016-12-04T22:22:17Z

As I wrote earlier, in my opinion the limitation of one-jar-rules-them-all approach is that we first need to build the code for each target platform, store the binaries somewhere, and then during the publication to maven pull the binaries from there and include in the final jar. I don't know how to do it.

Why storing binaries somewhere? how about put all native libraries in local disk (resource directory), include them in the jar when building and finally publish jar to maven?

alexeygrigorev · 2016-12-05T11:01:53Z

Ok, so how would you do this? Somebody builds binaries for windows and then sends them over email to the person with linux?

CodingCat · 2016-12-05T12:45:08Z

It's another question i do not understand...

Why we have to involve more than one persons for cross building? It's hard to imagine a program release process requires two persons....

In rocksdb, they use vagrant to cross build ubuntu and mac...xgboost does mot have those system calls or something else, these two platforms can share the same native lib file in most of cases..

For windows, i am not a expert in win programming...even vargrant does not work, a manual within-VM will achieve the same goal

CodingCat · 2016-12-05T13:01:20Z

The next question to discuss is... can we skip windows when releasing to maven? The main reason is that we do not have enough(zero?) test on xgboost4j under windows...

alexeygrigorev · 2016-12-05T13:01:25Z

Well we probably don't need to involve more than one user, but I'm not an expert in vargrant either, sorry.

But what I suggest does require three users:

user with linux builds xgb and runs mvn deploy. This publishes only the linux version.
users with windows and mac build xgb and run mvn --projects xgboost4j-native-windows deploy and mvn --projects xgboost4j-native-osx deploy respectively.

This is for publishing the snapshot version, a release build would be a bit more complicated, but I outlined it above. As I am not familiar with vargrant and other virtualization tools, I don't know how to organize it better.

Let me know if my proposal is interesting for you, otherwise I'm putting my current efforts on hold.

CodingCat · 2016-12-05T13:05:50Z

I will talk with mxnet guys to understand if there is any other reason for them to have many jars in mvn central

Craigacp · 2016-12-14T21:01:20Z

I successfully made a jar with a Windows dll, a Mac OSX macports dylib and a Linux so, and store that in our artifactory which works pretty well. Apart from when someone who uses brew tries to use the macports dylib and it gives an odd library not found error.

Widerstehen · 2017-03-18T02:28:11Z

@alexeygrigorev I am looking forward to get a windows OS xgboost-spark JAR from maven central repository or others, i offen code in IntelliJ IDEA tool windows OS，then run project in Linux production system, because it is convenient debugging. In my experience, it is easy to compile xgboost in Linux OS, but in windows OS ,i have never been successful. So if you have done it , please tell me ,thank you very much.

algorithmdog · 2017-03-20T07:27:44Z

I have the same problem with @frank111 .

virl · 2017-06-15T17:58:19Z

Please publish xgboost to Maven with bundled native libraries for all architectures.

CodingCat · 2017-06-15T18:26:14Z

Even for non-x86 architecture?

virl · 2017-06-16T09:16:51Z

@CodingCat Yes, for all architectures that XGBoost4J supports.

Please bundle native libraries into Maven package and load them at runtime depending on what architecture app is running.

Or at least allow to select native architecture via linking with different Maven packages at app's build time (not your library build time!), like DeepLearning4J does it.

Anyway, building from source just to select multithreading backend should not be required. And Maven packages should be enough for usage of the library.

mjakobus · 2017-06-16T11:54:08Z

I would also appreciate it very much if a least the major releases of XGBoost4J would be available via Maven.

I'm also using DeepLearning4J, which is very comfortable to use compared to XGBoost4J. In the meanwhile dl4j is even offering nightly builds on maven.

In my opinion the missing of reliable builds of XGBoost4J is a major bummer for more serious use cases for this great library. Especially on Windows building XGBoost4J is an heavy adventure ;)

virl · 2017-06-16T12:03:08Z

@mjakobus Yes, I have same feelings: XGBoost4J missing regular major releases and especially Maven-released packages with native backend selected at runtime.

anshbansal · 2017-11-07T11:23:43Z

How do people use this in production if it is not in maven central? Manually create the JAR files?

superbobry · 2017-11-07T11:55:48Z

At Criteo we build XGBoost JARs on Travis/Appveyor. In theory, the same scripts can be reused to publish the official JARs for XGBoost, but I didn't have the time to do that.

alexeygrigorev · 2017-11-07T13:15:17Z

We just manually put them to our nexus
(By "manually" I mean via maven, but not in a CI-configured way)

anshbansal · 2017-11-07T13:40:42Z

so the pom works to generate the artifact via standard maven jar building commands? And has this been tested in a linux environment?

alexeygrigorev · 2017-11-07T14:01:32Z

In our case - yes, and we do it only for linux machines

Craigacp · 2017-11-07T15:41:04Z

I've built a multi-jar with Linux, Windows and Mac libraries, and put it in an artifactory. Works fine from there.

edumucelli · 2017-11-07T16:16:19Z

At BlaBlaCar we build it then publish to an internal nexus. Then apps fetch from the nexus. It is not an multi-jar, thus we have Linux and Mac libraries separately. Apps then get the right dependency, e.g., using a Os.isFamily(Os.FAMILY_MAC). Would be great to have a multi-jar out-of-the-box, though. @Craigacp is your multi-jar available somewhere?

Craigacp · 2017-11-07T16:59:23Z

Unfortunately my version isn't available, but the logic in XGBoost4J causes it to load the correct binary based on the platform, so all you need to do is unzip each jar, copy the dll, so and dylib into the same resources directory and rejar it. If you require multiple linux versions, this approach won't work, as the loading logic isn't complicated enough (similarly it fails if you have multiple so files for different platforms e.g. Linux & Solaris).

superbobry · 2017-11-07T22:01:33Z

@edumucelli you can assemble a multi JAR by running download_latest_release.py from here.

It is built for an admittedly ancient CentOS6, so should work on CentOS7 as well as more recent Linux distributions.

edumucelli · 2017-11-08T21:00:15Z

@superbobry, that is great! Thank you for sharing it!

Obarros · 2018-02-13T00:44:29Z

@alexeygrigorev @CodingCat @edumucelli what was the outcome of this?
Is there a solution in place to automatically building JARs for xgboost and publish it somewhere?

alexeygrigorev · 2018-02-13T07:04:57Z

There is, yes. Right now it is possible to do mvn publish and it will deploy it to your local nexus repository

edumucelli · 2018-06-26T10:59:28Z

@Obarros I am using @superbobry's multi JAR on Debian-based containers in production.

CodingCat · 2018-06-26T21:10:24Z

for anyone who wants to use pre-built version of xgboost, please check README file in https://github.com/dmlc/xgboost/tree/master/jvm-packages, we have published artifacts to maven central

Obarros · 2018-06-26T21:42:10Z

@CodingCat, @edumucelli Thanks!

bluelu · 2018-06-28T09:53:28Z

@CodingCat could you please also push the windows artifacts as well? The published artifact only contains the linux version. thanks

edumucelli · 2018-06-28T10:23:30Z

@bluelu, it contains both Linux and MacOS.

superbobry · 2018-06-28T15:34:09Z

@edumucelli this has been discussed in #3276. tl;dr @CodingCat decided not to support Windows for the Maven Central JARs.

We have some prebuilt JARs over at criteo-forks/xgboost-jars which come with a Windows DLL, though.

bluelu · 2018-06-28T15:59:35Z

Hi, it's fine for me. I have build my own version, however it would help others certainly if it would be readily available without having to compile yourself.

edumucelli · 2018-06-28T17:17:11Z

@superbobry thanks for the link to that thread. That's not an issue, I was just complementing @bluelu's comment about linux-only jar, which in fact has a MacOS dlyb too.

Widerstehen mentioned this issue Mar 29, 2017

Couldn't see xgboost4j.scala.spark #2137

Closed

pommedeterresautee added Java-Scala and removed Java-Scala labels Jun 1, 2017

This was referenced Jul 31, 2017

[jvm-packages] Added baseMargin to ml.dmlc.xgboost4j.LabeledPoint #2532

Merged

[jvm-packages] Publishing JARs #2553

Closed

amaya382 mentioned this issue Aug 2, 2017

[WIP][HIVEMALL-99] Cross-compilation of XGBoost using Docker apache/incubator-hivemall#80

Closed

7 tasks

superbobry mentioned this issue Aug 21, 2017

make xgboost easy to use #2622

Closed

reckart mentioned this issue Mar 15, 2018

Add XGBoost as MLA dkpro/dkpro-tc#460

Closed

7 tasks

thesuperzapper mentioned this issue Jun 20, 2018

[jvm-packages] Distribute Spark jars for common architectures. #3396

Closed

CodingCat closed this as completed Jun 26, 2018

lock bot locked as resolved and limited conversation to collaborators Oct 25, 2018

[jvm-packages] Publishing xgboost4j and others to Maven Central #1807

[jvm-packages] Publishing xgboost4j and others to Maven Central #1807

Comments

alexeygrigorev commented Nov 23, 2016 • edited Loading

CodingCat commented Nov 23, 2016

CodingCat commented Nov 23, 2016

alexeygrigorev commented Nov 23, 2016

alexeygrigorev commented Nov 24, 2016 • edited Loading

CodingCat commented Dec 4, 2016

alexeygrigorev commented Dec 4, 2016

CodingCat commented Dec 4, 2016

alexeygrigorev commented Dec 4, 2016

CodingCat commented Dec 4, 2016

alexeygrigorev commented Dec 5, 2016

CodingCat commented Dec 5, 2016

CodingCat commented Dec 5, 2016

alexeygrigorev commented Dec 5, 2016 • edited Loading

CodingCat commented Dec 5, 2016

Craigacp commented Dec 14, 2016

Widerstehen commented Mar 18, 2017

algorithmdog commented Mar 20, 2017

virl commented Jun 15, 2017

CodingCat commented Jun 15, 2017

virl commented Jun 16, 2017 • edited Loading

mjakobus commented Jun 16, 2017

virl commented Jun 16, 2017

anshbansal commented Nov 7, 2017

superbobry commented Nov 7, 2017 • edited Loading

alexeygrigorev commented Nov 7, 2017 • edited Loading

anshbansal commented Nov 7, 2017

alexeygrigorev commented Nov 7, 2017

Craigacp commented Nov 7, 2017

edumucelli commented Nov 7, 2017

Craigacp commented Nov 7, 2017 • edited Loading

superbobry commented Nov 7, 2017 • edited Loading

edumucelli commented Nov 8, 2017

Obarros commented Feb 13, 2018

alexeygrigorev commented Feb 13, 2018

edumucelli commented Jun 26, 2018 • edited Loading

CodingCat commented Jun 26, 2018

Obarros commented Jun 26, 2018

bluelu commented Jun 28, 2018

edumucelli commented Jun 28, 2018

superbobry commented Jun 28, 2018

bluelu commented Jun 28, 2018

edumucelli commented Jun 28, 2018

alexeygrigorev commented Nov 23, 2016 •

edited

Loading

alexeygrigorev commented Nov 24, 2016 •

edited

Loading

alexeygrigorev commented Dec 5, 2016 •

edited

Loading

virl commented Jun 16, 2017 •

edited

Loading

superbobry commented Nov 7, 2017 •

edited

Loading

alexeygrigorev commented Nov 7, 2017 •

edited

Loading

Craigacp commented Nov 7, 2017 •

edited

Loading

superbobry commented Nov 7, 2017 •

edited

Loading

edumucelli commented Jun 26, 2018 •

edited

Loading