Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release xgboost 1.2 with GPU support #133

Closed
wants to merge 5 commits into from

Conversation

edwardjkim
Copy link
Contributor

@edwardjkim edwardjkim commented Sep 4, 2020

Description of changes:

This CR upgrades XGBoost to 1.2 and enables GPU support.

  • When the image is build with xgboost 1.1, many integration tests fail with an xgboost error on feature mismatch, e.g.,
    xgboost.core.XGBoostError: [16:47:33] /workspace/src/learner.cc:1062: Check failed: learner_model_param_.num_feature == p_fmat->Info().num_col_ (9 vs. 8) : Number of columns does not match number of features in booster.
    
    This is due to a bug in 1.1 (github issues: xgboost 1.1.1 pred failed, while 0.90 pred success dmlc/xgboost#5841, Regression demo is broken dmlc/xgboost#5709). This has been fixed in 1.2 (Fix prediction heuristic dmlc/xgboost#5955). From 1.2 release notes:
    Restore capability to run prediction when the test input has fewer features than the training data (#5955). This capability is necessary to support predicting with LIBSVM inputs. The previous release (1.1) had broken this capability, so we restore it in this version with better tests.
    
    Since it doesn't sound like upstream XGBoost will backport this fix to 1.1, we release 1.2 in this CR.
  • New in XGBoost 1.1 & 1.2
  • MLIO needs to be upgraded. The latest version of MLIO is v0.6. However, the conda package for v0.5 and v0.6 add ~3GB uncompressed (~1GB compressed) to the docker image (mainly due to a huge list of dependencies for image reader, e.g., ffmpeng, opencv, which were newly added in v0.5) increasing training time by ~1 minute. Thus, Dockerfile is optimized and rewritten to install mlio from source. The final image size is 1326.14 MB (compressed) with XGBoost 1.2, MLIO upgrade, and GPU support, compared to 1225.65 MB (compressed) for 1.0-1-cpu-py3.
  • GPU support
    • We could install the CUDA toolkit, but installing CUDA Toolkit will increase the image size by around 700 MB (compressed). The proposed base image nvidia/cuda:${CUDA_VERSION}-base-ubuntu${UBUNTU_VERSION} is a small image that contains a minimal set of CUDA runtime files.
    • Customers will have to specify the parameter tree_method: gpu_hist (and use the correct instance type, e.g., p3.xlarge, p3.2xlarge) to enable GPU training.
  • With GPU support in the same image as the CPU image, it is no longer necessary to append the architecture in the image tag. Since we dropped Python 2 support, the -cpu-py3 in the framework version is also redundant, and this CR proposes to drop the -<architecture>-<python version> suffix. (However, we will keep the old tag format in the deployment pipelines for backwards compatibility. That is, we will tag the same image with two tags: 1.2-1 and 1.2-1-cpu-py3.)

Testing: tox, integration tests

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@edwardjkim
Copy link
Contributor Author

Codebuild failed, but I'm sending this PR out for review while I work on fixing codebuild.

@edwardjkim edwardjkim changed the title [WIP] Release xgboost 1.2 with GPU support Release xgboost 1.2 with GPU support Sep 9, 2020
@edwardjkim edwardjkim closed this Sep 10, 2020
@edwardjkim edwardjkim reopened this Sep 10, 2020
@edwardjkim
Copy link
Contributor Author

edwardjkim commented Sep 10, 2020

Closing this PR since we had to reset codebuild. New PR: #134

@edwardjkim edwardjkim closed this Sep 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant