Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] XGBoost 1.0.0 Release Candidate #5253

Closed
12 tasks done
hcho3 opened this issue Jan 31, 2020 · 31 comments
Closed
12 tasks done

[RFC] XGBoost 1.0.0 Release Candidate #5253

hcho3 opened this issue Jan 31, 2020 · 31 comments

Comments

@hcho3
Copy link
Collaborator

hcho3 commented Jan 31, 2020

The last release (0.90) came out on May 20, 2019, and after 8 months of effort, we proudly announce the 1.0.0 release. In the next two weeks, the community is invited to try out the release candidate (RC).

Feedback period: until the end of February 14, 2020 February 17, 2020. No new feature will be added to the 1.0.0 release; only critical bug fixes will be added.

@dmlc/xgboost-committer

Now available

pip3 install xgboost==1.0.0rc2
  • R package. RC2 available from the Releases section. Download the tarball file xgboost_1.0.0.1.tar.gz and run
R CMD INSTALL xgboost_1.0.0.1.tar.gz
  • JVM packages. RC2 available from the Releases section. Download the JAR files xgboost4j_2.12-1.0.0-RC2.jar and xgboost4j-spark_2.12-1.0.0-RC2.jar and run
mvn install:install-file -Dfile=./xgboost4j_2.12-1.0.0-RC2.jar -DgroupId=ml.dmlc \
    -DartifactId=xgboost4j_2.12 -Dversion=1.0.0-RC2 -Dpackaging=jar
mvn install:install-file -Dfile=./xgboost4j-spark_2.12-1.0.0-RC2.jar -DgroupId=ml.dmlc \
    -DartifactId=xgboost4j-spark_2.12 -Dversion=1.0.0-RC2 -Dpackaging=jar

to install the JARs into your local Maven repository. Now you should be able to add XGBoost4J and XGBoost4J-Spark as Maven dependencies:

<dependency>
    <groupId>ml.dmlc</groupId>
    <artifactId>xgboost4j_2.12</artifactId>
    <version>1.0.0-RC2</version>
</dependency>
<dependency>
    <groupId>ml.dmlc</groupId>
    <artifactId>xgboost4j-spark_2.12</artifactId>
    <version>1.0.0-RC2</version>
</dependency>

TODOs

  • Create a new branch release_1.0.0.
  • Create Python wheels and upload to the Releases section and PyPI (use pre-release mechanism)
  • Create JAR files for the JVM packages and upload to the Releases section
  • Create a tarball for the R package and upload to the Releases section
  • Write a summary of 1.0.0 release

Outstanding patches that should make it into the 1.0.0 release:

Merged after RC1:

Merged after RC2

Known limitation

  • When training parameter reg_lambda is set to zero, some leaf nodes may be assigned a NaN value. (See discussion) For now, please set reg_lambda to a nonzero value.
@hcho3 hcho3 pinned this issue Jan 31, 2020
@hcho3
Copy link
Collaborator Author

hcho3 commented Jan 31, 2020

@RAMitchell @trivialfis For some reason, the Linux binary wheel is now 208 MB, exceeding the 200 MB size limit. Let us find ways to reduce it. The last stable release was only 142.8 MB.

Update. libxgboost.so is copied twice into the wheel:

$ zipinfo -1 xgboost-1.0.0rc1-py2.py3-none-manylinux1_x86_64.whl | grep libxgboost.so
xgboost/lib/libxgboost.so
xgboost-1.0.0rc1.data/data/xgboost/libxgboost.so

For now, I am removing one manually, but ideally CI should automatically de-duplicate the so file.

@trivialfis
Copy link
Member

We can remove the duplication. I consider it as a bug

@hcho3
Copy link
Collaborator Author

hcho3 commented Jan 31, 2020

@trivialfis Should I try to fix CI so that libxgboost.so is not duplicated? I modified (by hand) the whl file and uploaded the modified version to PyPI: https://pypi.org/project/xgboost/1.0.0rc1/#files

@trivialfis
Copy link
Member

For this release we might have to make do with manually removing it. But I will refactor the setup script so it can be more friendly.

@hcho3
Copy link
Collaborator Author

hcho3 commented Jan 31, 2020

The R package is producing a warning about the use of std::cout in the codebase:

* checking compiled code ... NOTE
File ‘xgboost/libs/xgboost.so’:
  Found ‘_ZSt4cout’, possibly from ‘std::cout’ (C++)
    Object: ‘./amalgamation/xgboost-all0.o’

Compiled code should not call entry points which might terminate R nor
write to stdout/stderr instead of to the console, nor use Fortran I/O
nor system RNGs.

See ‘Writing portable packages’ in the ‘Writing R Extensions’ manual.

CRAN may not accept XGBoost if this warning persists.

There are two places where std::cout is used:

  • src/common/timer.cc: it should be sufficient to replace std::cout with LOG(CONSOLE).
  • src/common/observer.h: since the Observer is a debugging feature, I suggest we disable it for the R package.

@trivialfis
Copy link
Member

Agreed! Just replace/disable it.

@hcho3
Copy link
Collaborator Author

hcho3 commented Jan 31, 2020

@trams @CodingCat It's my first time to create JAR artifacts for XGBoost4J. I used CentOS 6 Docker image to compile the native lib. I'd love to get your feedback and learn the best practices for making releases in Java world.

@ankane
Copy link
Contributor

ankane commented Feb 1, 2020

This is great! Happy to report it works well with:

(both commits in branches)

@hcho3
Copy link
Collaborator Author

hcho3 commented Feb 1, 2020

@ankane That's good to know. Thanks! One tidbit: XGBoost now requires CMake 3.16+ for Mac target, so we no longer require the extra CMake flags. So we should be able to remove the following lines:

      args << "-DOpenMP_C_FLAGS=\"-Xpreprocessor -fopenmp -I#{libomp.opt_include}\""
      args << "-DOpenMP_C_LIB_NAMES=omp"
      args << "-DOpenMP_CXX_FLAGS=\"-Xpreprocessor -fopenmp -I#{libomp.opt_include}\""
      args << "-DOpenMP_CXX_LIB_NAMES=omp"
      args << "-DOpenMP_omp_LIBRARY=#{libomp.opt_lib}/libomp.dylib"

https://github.com/ankane/homebrew-core/blob/1f39811c129a2a36d391368d5791fdff257b5f94/Formula/xgboost.rb#L22-L26

That is, cmake .. will just work, as long as CMake 3.16+ is installed.

@ankane
Copy link
Contributor

ankane commented Feb 1, 2020

It worked without the flags outside of Homebrew (https://github.com/ankane/ml-builds/compare/xgboost-1-0), but needed them inside the Homebrew environment for some reason. I believe Homebrew modifies some of the build flags in their compiler shim.

@hcho3
Copy link
Collaborator Author

hcho3 commented Feb 1, 2020

@ankane Ah I see. Glad to hear that we finally have OpenMP-enabled XGBoost in Homebrew!

@trivialfis
Copy link
Member

@hcho3 Out of curious

xgboost-1.0.0rc1-py2.py3-none-manylinux1_x86_64.whl

Does py2 in the file name imply we support Python2?

@hcho3
Copy link
Collaborator Author

hcho3 commented Feb 1, 2020

@trivialfis No, it does not, since we have this line

python_requires='>=3.5',

However, let me rename the wheel to use py3 exclusively.

@trams
Copy link
Contributor

trams commented Feb 1, 2020

@akimboyko, do you guys want to take a look on this RC?

@trams
Copy link
Contributor

trams commented Feb 1, 2020

@hcho3 You did right!
Building JARs is like building wheels. It is not java part which is problematic but rather native libxgboost.so library. What you generally want from this library is to depend on the oldest possible version of LIBC. The idea is that libc is backward compatible on a binary level. So everything which compiled with libc version X will work for any version Y as long as Y >= X

What you do not want to do is build this native library on a new Ubuntu (let's say 19.10) and start a distributed training on a cluster with old Ubuntu (for example, 18.04 LTS). Dynamic linker won't find a new libc symbols your libxgboost.so will be referencing. I've done this at least once by accident :)

You can use nm command to see all undefined symbols in a so file

@hcho3
Copy link
Collaborator Author

hcho3 commented Feb 1, 2020

Adding a known limitation about needing a nonzero reg_lambda parameter. cc @trivialfis

@trivialfis
Copy link
Member

@hcho3 I don't think that's a very serious problem, as least not among machine learning libraries. Just state that if NAN is encountered try adjusting lambda.

@hcho3
Copy link
Collaborator Author

hcho3 commented Feb 1, 2020

"Known limitations" will be part of the Release Note.

@trivialfis
Copy link
Member

Got it . Thanks!

@hcho3
Copy link
Collaborator Author

hcho3 commented Feb 1, 2020

@trams That's good to hear. In the future, I may want to spend more time on the distributed portion of XGBoost.

@trivialfis
Copy link
Member

trivialfis commented Feb 2, 2020

@hcho3 Can we add Python 3.8 support by adding this to setup.py?

'Programming Language :: Python :: 3.8'

@hcho3
Copy link
Collaborator Author

hcho3 commented Feb 14, 2020

It looks like all blockers are resolved. I will push out 1.0 release by end of this week. I am putting up RC2 tonight, to allow users to try it. #5281 made a non-trivial change to serialization logic, so I am extending the feedback period to February 17, 2020 to ensure that nothing is broken.

@hcho3
Copy link
Collaborator Author

hcho3 commented Feb 14, 2020

RC2 is now available.

@hcho3
Copy link
Collaborator Author

hcho3 commented Feb 18, 2020

As promised, I will now commence my work on 1.0.0 release. I am currently going through the 311 commits that have been made since 0.90 and summarize what they are about.

@hcho3
Copy link
Collaborator Author

hcho3 commented Feb 19, 2020

@trivialfis @RAMitchell It would help me tremendously if you can give me a list of some "highlights" of your work. (Brief explanation would also be nice)
For example:

  • cleaning up configuration and serialization.
  • Dask integration

I want to make sure that I am not leaving out anything substantial :)

EDIT. For now, I'll use the 1.0.0 roadmap (#4680) to organize the release note.

@JohnZed
Copy link
Contributor

JohnZed commented Feb 19, 2020

a couple of highlights:

@hcho3
Copy link
Collaborator Author

hcho3 commented Feb 19, 2020

@JohnZed Wow, thanks so much!

@trivialfis
Copy link
Member

Updated.

@hcho3
Copy link
Collaborator Author

hcho3 commented Feb 20, 2020

1.0.0 release is now out. I'm about to finish up the full release note for 1.0.0, but for now I am uploading Python wheels first.

@hcho3 hcho3 closed this as completed Feb 20, 2020
@trivialfis
Copy link
Member

trivialfis commented Feb 20, 2020

Great! Let me know I can look into the release note. Quite a bit of change this time

@ankane
Copy link
Contributor

ankane commented Feb 20, 2020

Ruby gem is out and Homebrew PR submitted Homebrew/homebrew-core#50467 (still needed OpenMP flags, may file a separate issue w/ Homebrew about it)

@hcho3 hcho3 unpinned this issue Feb 21, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants