Release xgboost 1.2 with GPU support #133

edwardjkim · 2020-09-04T22:45:29Z

Description of changes:

This CR upgrades XGBoost to 1.2 and enables GPU support.

When the image is build with xgboost 1.1, many integration tests fail with an xgboost error on feature mismatch, e.g.,

xgboost.core.XGBoostError: [16:47:33] /workspace/src/learner.cc:1062: Check failed: learner_model_param_.num_feature == p_fmat->Info().num_col_ (9 vs. 8) : Number of columns does not match number of features in booster.

This is due to a bug in 1.1 (github issues: xgboost 1.1.1 pred failed, while 0.90 pred success dmlc/xgboost#5841, Regression demo is broken dmlc/xgboost#5709). This has been fixed in 1.2 (Fix prediction heuristic dmlc/xgboost#5955). From 1.2 release notes:

Restore capability to run prediction when the test input has fewer features than the training data (#5955). This capability is necessary to support predicting with LIBSVM inputs. The previous release (1.1) had broken this capability, so we restore it in this version with better tests.

Since it doesn't sound like upstream XGBoost will backport this fix to 1.1, we release 1.2 in this CR.

New in XGBoost 1.1 & 1.2
- silent parameter has been removed in 1.1 in favor of verbose.
- A new objective survival:aft is added to support survival analysis: https://xgboost.readthedocs.io/en/latest/tutorials/aft_survival_analysis.html
MLIO needs to be upgraded. The latest version of MLIO is v0.6. However, the conda package for v0.5 and v0.6 add ~3GB uncompressed (~1GB compressed) to the docker image (mainly due to a huge list of dependencies for image reader, e.g., ffmpeng, opencv, which were newly added in v0.5) increasing training time by ~1 minute. Thus, Dockerfile is optimized and rewritten to install mlio from source. The final image size is 1326.14 MB (compressed) with XGBoost 1.2, MLIO upgrade, and GPU support, compared to 1225.65 MB (compressed) for 1.0-1-cpu-py3.
GPU support
- We could install the CUDA toolkit, but installing CUDA Toolkit will increase the image size by around 700 MB (compressed). The proposed base image nvidia/cuda:${CUDA_VERSION}-base-ubuntu${UBUNTU_VERSION} is a small image that contains a minimal set of CUDA runtime files.
- Customers will have to specify the parameter tree_method: gpu_hist (and use the correct instance type, e.g., p3.xlarge, p3.2xlarge) to enable GPU training.
With GPU support in the same image as the CPU image, it is no longer necessary to append the architecture in the image tag. Since we dropped Python 2 support, the -cpu-py3 in the framework version is also redundant, and this CR proposes to drop the -<architecture>-<python version> suffix. (However, we will keep the old tag format in the deployment pipelines for backwards compatibility. That is, we will tag the same image with two tags: 1.2-1 and 1.2-1-cpu-py3.)

Testing: tox, integration tests

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

edwardjkim · 2020-09-08T21:25:03Z

Codebuild failed, but I'm sending this PR out for review while I work on fixing codebuild.

This reverts commit 5a8c687.

edwardjkim · 2020-09-10T21:11:12Z

Closing this PR since we had to reset codebuild. New PR: #134

edwardjkim force-pushed the 1.2-1-development branch from be59f5e to 4bc90a7 Compare September 8, 2020 17:22

edwardjkim mentioned this pull request Sep 8, 2020

Add single GPU support #127

Closed

edwardjkim force-pushed the 1.2-1-development branch from 4bc90a7 to afe579b Compare September 8, 2020 20:35

edwardjkim requested review from balajitummala, aws-patlin and ericangelokim September 8, 2020 21:24

Release xgboost 1.2 with GPU support

56e606c

edwardjkim force-pushed the 1.2-1-development branch from afe579b to 56e606c Compare September 9, 2020 01:18

edwardjkim changed the title ~~[WIP] Release xgboost 1.2 with GPU support~~ Release xgboost 1.2 with GPU support Sep 9, 2020

edwardjkim closed this Sep 10, 2020

edwardjkim reopened this Sep 10, 2020

edwardjkim added 4 commits September 10, 2020 20:27

Rename dockerfiles back to Dockerfile.cpu

5a8c687

Revert "Rename dockerfiles back to Dockerfile.cpu"

aa8197d

This reverts commit 5a8c687.

Rename dockerfiles back to Dockerfile.cpu

6197477

Add back -cpu-py3 tag in Dockerfile

a85c226

edwardjkim closed this Sep 10, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Release xgboost 1.2 with GPU support #133

Release xgboost 1.2 with GPU support #133

edwardjkim commented Sep 4, 2020 •

edited

Loading

edwardjkim commented Sep 8, 2020

edwardjkim commented Sep 10, 2020 •

edited

Loading

Release xgboost 1.2 with GPU support #133

Release xgboost 1.2 with GPU support #133

Conversation

edwardjkim commented Sep 4, 2020 • edited Loading

edwardjkim commented Sep 8, 2020

edwardjkim commented Sep 10, 2020 • edited Loading

edwardjkim commented Sep 4, 2020 •

edited

Loading

edwardjkim commented Sep 10, 2020 •

edited

Loading