Skip to content

Commit

Permalink
Release xgboost 1.2 with GPU support
Browse files Browse the repository at this point in the history
  • Loading branch information
edwardjkim committed Sep 9, 2020
1 parent cd4fb0d commit 56e606c
Show file tree
Hide file tree
Showing 19 changed files with 188 additions and 114 deletions.
18 changes: 9 additions & 9 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ Base Images
The "base" Dockerfile encompass the installation of the framework and all of the dependencies
needed.

Tagging scheme is based on <SageMaker-XGBoost-version>-cpu-py3 (e.g. |XGBoostLatestVersion|-cpu-py3), where
Tagging scheme is based on <SageMaker-XGBoost-version> (e.g. |XGBoostLatestVersion|), where
<SageMaker-XGBoost-version> is comprised of <XGBoost-version>-<SageMaker-version>.

All "final" Dockerfiles build images using base images that use the tagging scheme
Expand All @@ -74,14 +74,14 @@ If you want to build your base docker image, then use:
# All build instructions assume you're building from the root directory of the sagemaker-xgboost-container.

# CPU
docker build -t xgboost-container-base:<SageMaker-XGBoost-version>-cpu-py3 -f docker/<SageMaker-XGBoost-version>/base/Dockerfile.cpu .
docker build -t xgboost-container-base:<SageMaker-XGBoost-version> -f docker/<SageMaker-XGBoost-version>/base/Dockerfile .

.. parsed-literal::
# Example
# CPU
docker build -t xgboost-container-base:|XGBoostLatestVersion|-cpu-py3 -f docker/|XGBoostLatestVersion|/base/Dockerfile.cpu .
docker build -t xgboost-container-base:|XGBoostLatestVersion| -f docker/|XGBoostLatestVersion|/base/Dockerfile .
Final Images
Expand All @@ -92,7 +92,7 @@ The "final" Dockerfiles encompass the installation of the SageMaker specific sup
All "final" Dockerfiles use base images for building.

These "base" images are specified with the naming convention of
xgboost-container-base:<SageMaker-XGBoost-version>-cpu-py3.
xgboost-container-base:<SageMaker-XGBoost-version>.

Before building "final" images:

Expand All @@ -103,7 +103,7 @@ Dockerfile.

# Create the SageMaker XGBoost Container Python package.
cd sagemaker-xgboost-container
python setup.py bdist_wheel --universal
python setup.py bdist_wheel

If you want to build "final" Docker images, then use:

Expand All @@ -112,14 +112,14 @@ If you want to build "final" Docker images, then use:
# All build instructions assume you're building from the root directory of the sagemaker-xgboost-container.

# CPU
docker build -t <image_name>:<tag> -f docker/<xgboost-version>/final/Dockerfile.cpu .
docker build -t <image_name>:<tag> -f docker/<xgboost-version>/final/Dockerfile .

.. parsed-literal::
# Example
# CPU
docker build -t preprod-xgboost-container:|XGBoostLatestVersion|-cpu-py3 -f docker/|XGBoostLatestVersion|/final/Dockerfile.cpu .
docker build -t preprod-xgboost-container:|XGBoostLatestVersion| -f docker/|XGBoostLatestVersion|/final/Dockerfile .
Running the tests
-----------------
Expand Down Expand Up @@ -195,7 +195,7 @@ If you want to run local integration tests, then use:
# Example
pytest test/integration/local --docker-base-name preprod-xgboost-container ``\``
--tag |XGBoostLatestVersion|-cpu-py3 ``\``
--tag |XGBoostLatestVersion| ``\``
--py-version 3 ``\``
--framework-version |XGBoostLatestVersion|
Expand Down Expand Up @@ -252,4 +252,4 @@ SageMaker XGboost Framework Container is licensed under the Apache 2.0 License.
.com, Inc. or its affiliates. All Rights Reserved. The license is available at:
http://aws.amazon.com/apache2.0/

.. |XGBoostLatestVersion| replace:: 1.0-1
.. |XGBoostLatestVersion| replace:: 1.2-1
39 changes: 0 additions & 39 deletions docker/1.0-1/base/Dockerfile.cpu

This file was deleted.

98 changes: 98 additions & 0 deletions docker/1.2-1/base/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
ARG UBUNTU_VERSION=18.04
ARG CUDA_VERSION=10.2

FROM nvidia/cuda:${CUDA_VERSION}-base-ubuntu${UBUNTU_VERSION}

ARG PYTHON_VERSION=3.7
ARG PYARROW_VERSION=0.16.0
ARG MLIO_VERSION=0.6.0
ARG XGBOOST_VERSION=1.2.0

ENV DEBIAN_FRONTEND=noninteractive
ENV LANG=C.UTF-8
ENV LC_ALL=C.UTF-8

# Python won’t try to write .pyc or .pyo files on the import of source modules
# Force stdin, stdout and stderr to be totally unbuffered. Good for logging
ENV PYTHONDONTWRITEBYTECODE=1
ENV PYTHONUNBUFFERED=1
ENV PYTHONIOENCODING='utf-8'

RUN apt-get update && \
apt-get -y install --no-install-recommends \
build-essential \
curl \
git \
jq \
libatlas-base-dev \
nginx \
openjdk-8-jdk-headless \
unzip \
wget \
&& \
# MLIO build dependencies
# Official Ubuntu APT repositories do not contain an up-to-date version of CMake required to build MLIO.
# Kitware contains the latest version of CMake.
apt-get -y install --no-install-recommends \
apt-transport-https \
ca-certificates \
gnupg \
software-properties-common \
&& \
wget -O - https://apt.kitware.com/keys/kitware-archive-latest.asc 2>/dev/null | \
gpg --dearmor - | \
tee /etc/apt/trusted.gpg.d/kitware.gpg >/dev/null && \
apt-add-repository 'deb https://apt.kitware.com/ubuntu/ bionic main' && \
apt-get update && \
apt-get install -y --no-install-recommends \
autoconf \
automake \
build-essential \
cmake \
doxygen \
libcurl4-openssl-dev \
libssl-dev \
libtool \
ninja-build \
python3-dev \
python3-distutils \
python3-pip \
zlib1g-dev \
&& \
rm -rf /var/lib/apt/lists/*

# Install conda
RUN echo 'installing miniconda' && \
curl -LO http://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh && \
bash Miniconda3-latest-Linux-x86_64.sh -bfp /miniconda3 && \
rm Miniconda3-latest-Linux-x86_64.sh

ENV PATH=/miniconda3/bin:${PATH}

# Install MLIO with Apache Arrow integration
# We could install mlio-py from conda, but it comes with extra support such as image reader that increases image size
# which increases training time. We build from source to minimize the image size.
RUN conda install python=${PYTHON_VERSION} && \
conda update -y conda && \
conda install -c conda-forge pyarrow=${PYARROW_VERSION} && \
cd /tmp && \
git clone --branch v${MLIO_VERSION} https://github.com/awslabs/ml-io.git mlio && \
cd mlio && \
build-tools/build-dependency build/third-party all && \
mkdir -p build/release && \
cd build/release && \
cmake -GNinja -DCMAKE_BUILD_TYPE=RelWithDebInfo -DCMAKE_PREFIX_PATH="$(pwd)/../third-party" ../.. && \
cmake --build . && \
cmake --build . --target install && \
cmake -DMLIO_INCLUDE_PYTHON_EXTENSION=ON -DMLIO_INCLUDE_ARROW_INTEGRATION=ON ../.. && \
cmake --build . --target mlio-py && \
cmake --build . --target mlio-arrow && \
cd ../../src/mlio-py && \
python3 setup.py bdist_wheel && \
python3 -m pip install dist/*.whl && \
cp -r /tmp/mlio/build/third-party/lib/intel64/gcc4.7/* /usr/local/lib/ && \
ldconfig && \
rm -rf /tmp/mlio

# Install latest version of XGBoost
RUN python3 -m pip install --no-cache -I xgboost==${XGBOOST_VERSION}
24 changes: 14 additions & 10 deletions docker/1.0-1/final/Dockerfile.cpu → docker/1.2-1/final/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,28 +1,32 @@
FROM xgboost-container-base:1.0-1-cpu-py3
ENV SAGEMAKER_XGBOOST_VERSION 1.0-1
ARG SAGEMAKER_XGBOOST_VERSION=1.2-1

FROM xgboost-container-base:${SAGEMAKER_XGBOOST_VERSION}

ARG PYTHON_VERSION
ARG SAGEMAKER_XGBOOST_VERSION

########################
# Install dependencies #
########################
COPY requirements.txt /requirements.txt
RUN pip install -r /requirements.txt && rm /requirements.txt
RUN python3 -m pip install -r /requirements.txt && rm /requirements.txt

###########################
# Copy wheel to container #
###########################
COPY dist/sagemaker_xgboost_container-2.0-py2.py3-none-any.whl /sagemaker_xgboost_container-1.0-py2.py3-none-any.whl
RUN pip install --no-cache /sagemaker_xgboost_container-1.0-py2.py3-none-any.whl && \
rm /sagemaker_xgboost_container-1.0-py2.py3-none-any.whl
COPY dist/sagemaker_xgboost_container-2.0-py3-none-any.whl /sagemaker_xgboost_container-2.0-py3-none-any.whl
RUN python3 -m pip install --no-cache /sagemaker_xgboost_container-2.0-py3-none-any.whl && \
rm /sagemaker_xgboost_container-2.0-py3-none-any.whl

##############
# DMLC PATCH #
##############
# TODO: remove after making contributions back to xgboost for tracker.py
COPY src/sagemaker_xgboost_container/dmlc_patch/tracker.py \
/miniconda3/lib/python3.6/site-packages/xgboost/dmlc-core/tracker/dmlc_tracker/tracker.py
/miniconda3/lib/python${PYTHON_VERSION}/site-packages/xgboost/dmlc-core/tracker/dmlc_tracker/tracker.py

# Include DMLC python code in PYTHONPATH to use RabitTracker
ENV PYTHONPATH=$PYTHONPATH:/miniconda3/lib/python3.6/site-packages/xgboost/dmlc-core/tracker
ENV PYTHONPATH=$PYTHONPATH:/miniconda3/lib/python${PYTHON_VERSION}/site-packages/xgboost/dmlc-core/tracker

#######
# MMS #
Expand All @@ -32,12 +36,12 @@ RUN useradd -m model-server
RUN mkdir -p /home/model-server/tmp && chown -R model-server /home/model-server

# Copy MMS configs
COPY docker/$SAGEMAKER_XGBOOST_VERSION/resources/mms/config.properties.tmp /home/model-server
COPY docker/${SAGEMAKER_XGBOOST_VERSION}/resources/mms/config.properties.tmp /home/model-server
ENV XGBOOST_MMS_CONFIG=/home/model-server/config.properties

# Copy execution parameters endpoint plugin for MMS
RUN mkdir -p /tmp/plugins
COPY docker/$SAGEMAKER_XGBOOST_VERSION/resources/mms/endpoints-1.0.jar /tmp/plugins
COPY docker/${SAGEMAKER_XGBOOST_VERSION}/resources/mms/endpoints-1.0.jar /tmp/plugins
RUN chmod +x /tmp/plugins/endpoints-1.0.jar

# Create directory for models
Expand Down
8 changes: 4 additions & 4 deletions requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2,16 +2,16 @@ PyYAML<4.3
gunicorn<20.0.0
matplotlib
multi-model-server==1.1.1
numpy
numpy==1.19.1
pandas>=0.24.0
psutil==5.6.7 # sagemaker-containers requires psutil 5.6.7
python-dateutil==2.8.0
requests<2.21
retrying==1.3.3
sagemaker-inference==1.2.0
sagemaker-containers>=2.8.3
scikit-learn
scipy==1.2.2
smdebug==0.4.13
scikit-learn==0.23.2
scipy==1.5.2
smdebug==0.9.2
urllib3<1.25
wheel
1 change: 1 addition & 0 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ def read(fname):
"License :: OSI Approved :: Apache Software License",
"Programming Language :: Python",
'Programming Language :: Python :: 3.6',
'Programming Language :: Python :: 3.7',
],

install_requires=read("requirements.txt"),
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -17,10 +17,8 @@


def initialize(metrics):
@hpv.range_validator(["auto", "exact", "approx", "hist"])
@hpv.range_validator(["auto", "exact", "approx", "hist", "gpu_hist"])
def tree_method_range_validator(CATEGORIES, value):
if "gpu" in value:
raise exc.UserError("GPU training is not supported yet.")
return value in CATEGORIES

@hpv.dependencies_validator(["booster", "process_type"])
Expand Down Expand Up @@ -57,10 +55,8 @@ def updater_validator(value, dependencies):
"following: 'grow_colmaker', 'distcol', 'grow_histmaker', "
"'grow_local_histmaker', 'grow_skmaker'")

@hpv.range_validator(["cpu_predictor"])
@hpv.range_validator(["auto", "cpu_predictor", "gpu_predictor"])
def predictor_validator(CATEGORIES, value):
if "gpu" in value:
raise exc.UserError("GPU training is not supported yet.")
return value in CATEGORIES

@hpv.dependencies_validator(["num_class"])
Expand Down Expand Up @@ -94,10 +90,13 @@ def eval_metric_range_validator(SUPPORTED_METRIC, metric):

@hpv.dependencies_validator(["objective"])
def eval_metric_dep_validator(value, dependencies):
objective = dependencies["objective"]
if "auc" in value:
if not any(dependencies["objective"].startswith(metric_type) for metric_type in [
'binary:', 'rank:']):
if not any(objective.startswith(metric_type) for metric_type in ['binary:', 'rank:']):
raise exc.UserError("Metric 'auc' can only be applied for classification and ranking problems.")
if "aft-nloglik" in value:
if objective not in ["survival:aft"]:
raise exc.UserError("Metric 'aft-nloglik' can only be applied for 'survival:aft' objective.")

@hpv.dependencies_validator(["tree_method"])
def monotone_constraints_validator(value, dependencies):
Expand All @@ -123,7 +122,6 @@ def interaction_constraints_validator(value, dependencies):
hpv.IntegerHyperparameter(name="csv_weights", range=hpv.Interval(min_closed=0, max_closed=1), required=False),
hpv.IntegerHyperparameter(name="early_stopping_rounds", range=hpv.Interval(min_closed=1), required=False),
hpv.CategoricalHyperparameter(name="booster", range=["gbtree", "gblinear", "dart"], required=False),
hpv.IntegerHyperparameter(name="silent", range=hpv.Interval(min_closed=0, max_closed=1), required=False),
hpv.IntegerHyperparameter(name="verbosity", range=hpv.Interval(min_closed=0, max_closed=3), required=False),
hpv.IntegerHyperparameter(name="nthread", range=hpv.Interval(min_closed=1), required=False),
hpv.ContinuousHyperparameter(name="eta", range=hpv.Interval(min_closed=0, max_closed=1), required=False,
Expand Down Expand Up @@ -204,11 +202,11 @@ def interaction_constraints_validator(value, dependencies):
hpv.ContinuousHyperparameter(name="tweedie_variance_power", range=hpv.Interval(min_open=1, max_open=2),
required=False),
hpv.CategoricalHyperparameter(name="objective",
range=["binary:logistic", "binary:logitraw", "binary:hinge",
"count:poisson", "multi:softmax", "multi:softprob",
range=["aft_loss_distribution", "binary:logistic", "binary:logitraw",
"binary:hinge", "count:poisson", "multi:softmax", "multi:softprob",
"rank:pairwise", "rank:ndcg", "rank:map", "reg:linear",
"reg:squarederror", "reg:logistic", "reg:gamma",
"reg:squaredlogerror", "reg:tweedie", "survival:cox"],
"reg:squarederror", "reg:logistic", "reg:gamma", "reg:pseudohubererror",
"reg:squaredlogerror", "reg:tweedie", "survival:aft", "survival:cox"],
dependencies=objective_validator,
required=False),
hpv.IntegerHyperparameter(name="num_class",
Expand All @@ -223,7 +221,13 @@ def interaction_constraints_validator(value, dependencies):
hpv.IntegerHyperparameter(name="seed", range=hpv.Interval(min_open=-2**31, max_open=2**31-1),
required=False),
hpv.IntegerHyperparameter(name="num_parallel_tree", range=hpv.Interval(min_closed=1), required=False),
hpv.CategoricalHyperparameter(name="save_model_on_termination", range=["true", "false"], required=False)
hpv.CategoricalHyperparameter(name="save_model_on_termination", range=["true", "false"], required=False),
hpv.CategoricalHyperparameter(name="aft_loss_distribution", range=["normal", "logistic", "extreme"],
required=False),
hpv.ContinuousHyperparameter(name="aft_loss_distribution_scale", range=hpv.Interval(min_closed=0),
required=False),
hpv.CategoricalHyperparameter(name="single_precision_histogram", range=["true", "false"], required=False),
hpv.CategoricalHyperparameter(name="deterministic_histogram", range=["true", "false"], required=False),
)

hyperparameters.declare_alias("eta", "learning_rate")
Expand Down
2 changes: 1 addition & 1 deletion src/sagemaker_xgboost_container/algorithm_mode/train.py
Original file line number Diff line number Diff line change
Expand Up @@ -179,7 +179,7 @@ def train_job(train_cfg, train_dmatrix, val_dmatrix, model_dir, checkpoint_dir,

# Parse arguments for train() API
early_stopping_rounds = train_cfg.get('early_stopping_rounds')
num_round = train_cfg["num_round"]
num_round = train_cfg.pop("num_round")

# Evaluation metrics to use with train() API
tuning_objective_metric_param = train_cfg.get("_tuning_objective_metric")
Expand Down
Loading

0 comments on commit 56e606c

Please sign in to comment.