Multi-gpu terminate called after throwing an instance of 'thrust::system::system_error' parallel_for failed: out of memory #3756

pablete · 2018-10-05T00:12:56Z

Building xgboost for multi-gpu support from c6b5df6
With CUDA 9.0
With NCCL 2.2.13-1+cuda9.0
on amazon p2.8xlarge 488GB Ram , 32 vcpus, 8 gpus K80

CODE

print(X.count())
print(X.info(memory_usage='deep'))

data = Data(X, y)
dtrain = xgb.DMatrix(data.X_train, data.y_train)
dval = xgb.DMatrix(data.X_val, data.y_val)
p = {'min_child_weight': 50,
    'subsample': 1.0,
    'max_depth': 11,
    'learning_rate': 0.03,
    'colsample_bytree': 0.4,
    'sketch_eps': 0.01,
    'n_gpus': 8,
    'alpha': 0,
    'lambda': 1,
    'max_bin': 64,
    'tree_method': 'gpu_hist',
    'objective': 'gpu:reg:logistic'}
bst = xgb.train(p, dtrain, 10, [(dtrain, "train"), (dval, "val")])

DATAFRAME has 120M entries 108 features (doubles)

<class 'pandas.core.frame.DataFrame'>
Int64Index: 124646054 entries, 0 to 139271
Columns: 108 entries, xxxxx to yyyyyyy
dtypes: float32(108)
memory usage: 51.1 GB

Traceback (most recent call last):
File "train_now.py", line 120, in
main()
File "train_now.py", line 87, in main
bst = xgb.train(params, dtrain, args.num_rounds, [(dtrain, "train"), (dval, "val")])
File "/mnt/xgboost/venv/local/lib/python2.7/site-packages/xgboost/training.py", line 216, in train
xgb_model=xgb_model, callbacks=callbacks)
File "/mnt/xgboost/venv/local/lib/python2.7/site-packages/xgboost/training.py", line 74, in _train_internal
bst.update(dtrain, i, obj)
File "/mnt/xgboost/venv/local/lib/python2.7/site-packages/xgboost/core.py", line 1035, in update
dtrain.handle))
File "/mnt/xgboost/venv/local/lib/python2.7/site-packages/xgboost/core.py", line 165, in _check_call
raise XGBoostError(_LIB.XGBGetLastError())
xgboost.core.XGBoostError: [19:26:13] /mnt/dmlc/xgboost/include/xgboost/../../src/common/common.h:41: /mnt/dmlc/xgboost/src/predictor/../common/device_helpers.cuh: 409: out of memory

Stack trace returned 10 entries:
[bt] (0) /mnt/xgboost/venv/xgboost/libxgboost.so(dmlc::StackTraceabi:cxx11+0x5b) [0x7fe98b5dba6b]
[bt] (1) /mnt/xgboost/venv/xgboost/libxgboost.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x28) [0x7fe98b5dc2e8]
[bt] (2) /mnt/xgboost/venv/xgboost/libxgboost.so(dh::ThrowOnCudaError(cudaError, char const*, int)+0x22f) [0x7fe98b814e4f]
[bt] (3) /mnt/xgboost/venv/xgboost/libxgboost.so(xgboost::predictor::DeviceMatrix::DeviceMatrix(xgboost::DMatrix*, int, bool)+0x16f) [0x7fe98b84ed4f]
[bt] (4) /mnt/xgboost/venv/xgboost/libxgboost.so(xgboost::predictor::GPUPredictor::DevicePredictInternal(xgboost::DMatrix*, xgboost::HostDeviceVector, xgboost::gbm::GBTreeModel const&, unsigned long, unsigned long)+0xbc7) [0x7fe98b850d47]
[bt] (5) /mnt/xgboost/venv/xgboost/libxgboost.so(xgboost::predictor::GPUPredictor::UpdatePredictionCache(xgboost::gbm::GBTreeModel const&, std::vector<std::unique_ptr<xgboost::TreeUpdater, std::default_deletexgboost::TreeUpdater >, std::allocator<std::unique_ptr<xgboost::TreeUpdater, std::default_deletexgboost::TreeUpdater > > >, int)+0x75) [0x7fe98b851475]
[bt] (6) /mnt/xgboost/venv/xgboost/libxgboost.so(xgboost::gbm::GBTree::DoBoost(xgboost::DMatrix*, xgboost::HostDeviceVector<xgboost::detail::GradientPairInternal >, xgboost::ObjFunction)+0x6d5) [0x7fe98b661ef5]
[bt] (7) /mnt/xgboost/venv/xgboost/libxgboost.so(xgboost::LearnerImpl::UpdateOneIter(int, xgboost::DMatrix*)+0x361) [0x7fe98b66f351]
[bt] (8) /mnt/xgboost/venv/xgboost/libxgboost.so(XGBoosterUpdateOneIter+0x48) [0x7fe98b5cfb58]
[bt] (9) /usr/lib/x86_64-linux-gnu/libffi.so.6(ffi_call_unix64+0x4c) [0x7fe9fd287e40]

terminate called after throwing an instance of 'thrust::system::system_error'
what(): parallel_for failed: out of memory
Aborted

nvidia-smi snapshot before the crash

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.87                 Driver Version: 390.87                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K80           Off  | 00000000:00:17.0 Off |                    0 |
| N/A   57C    P0    69W / 149W |   2095MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla K80           Off  | 00000000:00:18.0 Off |                    0 |
| N/A   53C    P0    82W / 149W |   2095MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   2  Tesla K80           Off  | 00000000:00:19.0 Off |                    0 |
| N/A   57C    P0    65W / 149W |   2095MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   3  Tesla K80           Off  | 00000000:00:1A.0 Off |                    0 |
| N/A   53C    P0    81W / 149W |   2095MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   4  Tesla K80           Off  | 00000000:00:1B.0 Off |                    0 |
| N/A   59C    P0    70W / 149W |   2095MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   5  Tesla K80           Off  | 00000000:00:1C.0 Off |                    0 |
| N/A   53C    P0    82W / 149W |   2095MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   6  Tesla K80           Off  | 00000000:00:1D.0 Off |                    0 |
| N/A   59C    P0    73W / 149W |   2095MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   7  Tesla K80           Off  | 00000000:00:1E.0 Off |                    0 |
| N/A   54C    P0    87W / 149W |   2095MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0     35738      C   -                                           2082MiB |
|    1     35738      C   -                                           2082MiB |
|    2     35738      C   -                                           2082MiB |
|    3     35738      C   -                                           2082MiB |
|    4     35738      C   -                                           2082MiB |
|    5     35738      C   -                                           2082MiB |
|    6     35738      C   -                                           2082MiB |
|    7     35738      C   -                                           2082MiB |
+-----------------------------------------------------------------------------+

I can not publish the dataset, but creating a similar one should be no problem.

Thanks

The text was updated successfully, but these errors were encountered:

trivialfis · 2018-10-05T01:44:48Z

Currently the predictor doesn't support multi-gpu yet, but should be there soon. :) see #3738.

So the there is only one device being used for prediction, which means the memory capacity is limited to a single device.

pablete · 2018-10-06T22:10:27Z

Thanks, I have combined it with predictor: 'cpu_predictor' and it works. Will wait until the multi-gpu predictor is available. I will test the PR early on.

Thanks for the pointer!

ch11y · 2018-11-19T09:21:33Z

I am also running XGBoost with GPU. My data is also large with 900 features with 30M samples.
terminate called after throwing an instance of 'thrust::system::system_error'
what(): parallel_for failed: out of memory

But when I checked from the GPU utility. It only takes 14G out of 16G of the GPU. I have also tried to set predictor:'cpu_predictor' or not. Both ending up with the same error. Any idea about why @hcho3

hcho3 · 2018-11-19T09:35:09Z

@ch11y Make sure to compile from the latest master. This issue has been resolved by #3738. If you keep seeing out-of-memory error, open a new issue.

canonizer mentioned this issue Oct 9, 2018

Multi-GPU support in GPUPredictor. #3738

Merged

hcho3 closed this as completed in #3738 Oct 24, 2018

dmlc locked as resolved and limited conversation to collaborators Nov 19, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi-gpu terminate called after throwing an instance of 'thrust::system::system_error' parallel_for failed: out of memory #3756

Multi-gpu terminate called after throwing an instance of 'thrust::system::system_error' parallel_for failed: out of memory #3756

pablete commented Oct 5, 2018 •

edited

Loading

trivialfis commented Oct 5, 2018

pablete commented Oct 6, 2018

ch11y commented Nov 19, 2018

hcho3 commented Nov 19, 2018 •

edited

Loading

Multi-gpu terminate called after throwing an instance of 'thrust::system::system_error' parallel_for failed: out of memory #3756

Multi-gpu terminate called after throwing an instance of 'thrust::system::system_error' parallel_for failed: out of memory #3756

Comments

pablete commented Oct 5, 2018 • edited Loading

trivialfis commented Oct 5, 2018

pablete commented Oct 6, 2018

ch11y commented Nov 19, 2018

hcho3 commented Nov 19, 2018 • edited Loading

pablete commented Oct 5, 2018 •

edited

Loading

hcho3 commented Nov 19, 2018 •

edited

Loading