-
-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
XGBoost 1.1.0 SNAPSHOT gpu_hist still numerically unstable #5632
Comments
I use a slightly modified script with syntax error fix, changing import click
import xgboost as xgb
import numpy as np
from sklearn.model_selection import train_test_split
import time
import random
import os
def fetch_covtype():
if not os.path.exists('covertype.npy'):
Xy = np.genfromtxt('covtype.data',
delimiter=',')
np.save('covertype', Xy)
Xy = np.load('covertype.npy')
X = Xy[:, :-1]
y = Xy[:, -1].astype(np.int32, copy=False)
return X, y
@click.command()
@click.option('--seed', type=int, default=0)
@click.option('--epochs', type=int, default=999)
@click.option('--no-cuda', type=bool, default=False)
def train(seed, epochs, no_cuda):
# Fetch dataset using sklearn
X, y = fetch_covtype()
param = {
'objective': 'multi:softmax',
'num_class': 8,
'single_precision_histogram': True
}
# Create 0.75/0.25 train/test split
X_train, X_test, y_train, y_test = train_test_split(X,
y,
test_size=0.25,
train_size=0.75,
random_state=0)
# Set random seeds
random_seed(seed, param)
param['subsample'] = 0.5
param['colsample_bytree'] = 0.5
param['colsample_bylevel'] = 0.5
# Convert input data from numpy to XGBoost format
dtrain = xgb.DMatrix(X_train, label=y_train)
dtest = xgb.DMatrix(X_test, label=y_test)
# Set CPU or GPU as training device
if no_cuda:
param['tree_method'] = 'hist'
else:
param['tree_method'] = 'gpu_hist'
# Train on the chosen device
results = {}
gpu_runtime = time.time()
booster = xgb.train(param,
dtrain,
epochs,
evals=[(dtest, 'test')],
evals_result=results)
model = 'model.json'
i = 0
path = str(i) + '-' + model
while os.path.exists(path):
i += 1
path = str(i) + '-' + model
booster.save_model(path)
if not no_cuda:
print(f'GPU Run Time: {str(time.time() - gpu_runtime)} seconds')
else:
print(f'CPU Run Time: {str(time.time() - gpu_runtime)} seconds')
def random_seed(seed, param):
os.environ['PYTHONHASHSEED'] = str(seed) # Python general
np.random.seed(seed)
random.seed(seed) # Python random
param['seed'] = seed
if __name__ == '__main__':
train() Hash values from training for 2 runs with single precision:
Training for 2 runs with double precision:
|
Could you try using the rc build: #5593 ? |
@trivialfis Yes, I will try and update you. Sorry for the syntax errors, I basically copied it from my stuff and modified it inside the markdown editor here haha |
@Zethson No problem. Let us know the result. ;-) |
Ain't working for me!
edit: same with single_precision_histogram:
|
Corresponding repository: https://github.com/Zethson/nextflow_machine_learning Python script is under /bin. run command: Running on a Nvidia 1050M |
@Zethson Are you using dask? |
No |
The same issue appears on my system with a tesla k80 |
Let me try running on different cards tomorrow. |
If this doesn't help, then you maybe need to try out Docker with the nvidia container toolkit (and maybe even nextflow). |
@Zethson Can you confirm the result on CPU is reproducible? |
I can. |
Will try other cards. Tested with 1080 and 1080 TI, couldn't reproduce. |
Please double check your installed xgboost version with |
The Conda environment |
Tried 3 other cards with 1 mobile card included (960m) today, still couldn't reproduce. The XGBoost binary pip distribution is built with cuda 9 so I also tried it, still no luck. |
@trivialfis Could you maybe try it with Docker and the nvidia container toolkit using my containers? I can give you more detailed instructions if you require them. I can reproduce it on 2 systems, both of which are running the models inside Docker via Nextflow... |
I will try to run it outside of Docker and see what happens on my system. Edit: Well, 25 epochs are even enough to show that I get different results between runs on my system. |
I could offer you that you give me your SSH key and I give you access to one of my machines. |
Noticed compiling on 9.0 there's a waning might be relevant:
Trying 9.2 again. |
@hcho3 @RAMitchell Can we upgrade the default build to 9.2 ? I think there's an issue during compilation. Thus far I can only reproduce with 9.0 build. |
Seems reasonable. We could do 9.2 for this release and move to 10 after. |
@trivialfis 9.2 seems reasonable. Did you verify the reproducibility with 9.2? |
@hcho3 Yes, tried 9.0 all the way to 10.2 . |
@trivialfis So the bug disappeared when you upgraded CUDA version to 9.2? Okay, I will file a PR to upgrade CUDA to 9.2. |
I am using Cuda 10.2 by the way, not 9.2 Why do you expect this to solve my reproducibility issue? |
@Zethson Our server was using CUDA 9.0 to build the binary wheel (that's what you get when you run |
Ahh, I see. Cool! Looking forward to trying rc3 out :) |
@Zethson Here is a new wheel built with CUDA 10.0: https://s3-us-west-2.amazonaws.com/xgboost-nightly-builds/PR-5649/xgboost-1.1.0rc2%2B65895f677a23de0de74ad8e5d0dbab6220bfbf10-py3-none-manylinux2010_x86_64.whl. See if it works for you. |
I tried the repro script on a new EC2 instance (g4dn.12xlarge), which has NVIDIA T4 GPU. The new wheel runs deterministically, whereas the current RC2 does not. Two runs with the current RC2:
Two runs with the new wheel:
|
I can confirm your finding and it is now indeed reproducible! Thank you very much for your very swift and competent help. I will also conduct some experiments on reproducibility with Dask soon, so be prepared for more issues related to reproducibility :) (Let's hope for the best!) |
Dask data partitioning is not reproducible in some cases. You are more than welcomed to join investigating. |
@trivialfis Do you by chance have any links where I could read up on those cases? |
@Zethson No, I made the PR for single node deterministic GPU. Dask is the next step, but they are baby steps... |
Dear everyone,
according to #5023 a model using gpu_hist should be reproducible using 1.1.0
The version that I was using for these experiments is:
https://s3-us-west-2.amazonaws.com/xgboost-nightly-builds/xgboost-1.1.0_SNAPSHOT%2B86beb68ce8ffa73a90def6a9862f0e8a917b58c3-py2.py3-none-manylinux1_x86_64.whl
The code that I am using is:
When training the covertype dataset with 1000 epochs I get the following results:
GPU Run Time: 470.45096254348755 seconds
GPU Run Time: 493.8201196193695 seconds
GPU Run Time: 484.21098017692566 seconds
GPU Run Time: 477.36872577667236 seconds
GPU Run Time: 487.09354853630066 seconds
Is this still to be expected or why do I not get perfectly reproducible results?
Thank you very much!
The text was updated successfully, but these errors were encountered: