Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python predict() does not work with multiprocessing #4246

Closed
hcho3 opened this issue Mar 11, 2019 · 15 comments
Closed

Python predict() does not work with multiprocessing #4246

hcho3 opened this issue Mar 11, 2019 · 15 comments

Comments

@hcho3
Copy link
Collaborator

hcho3 commented Mar 11, 2019

Related:

It has been reported that the predict() function in the Python interface does not work well with multiprocessing. We should find a way to allow multiple processes to predict with the same model simultaneously.

@hcho3 hcho3 changed the title Python predict() should work with multiprocessing Python predict() does not work with multiprocessing Mar 11, 2019
@andreieuganox
Copy link

Is there any update on this. It seems that this is complete stoper from using xgb on Production...?

@xEcEz
Copy link

xEcEz commented Jul 25, 2019

Any update? I am just discovering this now. This is indeed a problem...

It has been reported that the predict() function in the Python interface does not work well with multiprocessing. We should find a way to allow multiple processes to predict with the same model simultaneously.

What do you mean exactly?

In my context, I have a pool of processes that each load a pickled model and then try to make predictions, which is where I get the dmlc::Error.
Note that I also tried with a unique process in the pool and still got the same error.

Here is the error stack:

terminate called after throwing an instance of 'dmlc::Error'
  what():  [13:08:08] /workspace/include/xgboost/./../../src/common/common.h:41: /workspace/src/common/host_device_vector.cu: 150: initialization error

Stack trace returned 10 entries:
[bt] (0) /home/.../.local/lib/python3.6/site-packages/xgboost/./lib/libxgboost.so(dmlc::StackTrace(unsigned long)+0x47) [0x7f14b4c0ffc7]
[bt] (1) /home/.../.local/lib/python3.6/site-packages/xgboost/./lib/libxgboost.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x1d) [0x7f14b4c1042d]
[bt] (2) /home/.../.local/lib/python3.6/site-packages/xgboost/./lib/libxgboost.so(dh::ThrowOnCudaError(cudaError, char const*, int)+0x123) [0x7f14b4de2153]
[bt] (3) /home/.../.local/lib/python3.6/site-packages/xgboost/./lib/libxgboost.so(xgboost::HostDeviceVectorImpl<float>::DeviceShard::Init(xgboost::HostDeviceVectorImpl<float>*, int)+0x278) [0x7f14b4e3fb68]
[bt] (4) /home/.../.local/lib/python3.6/site-packages/xgboost/./lib/libxgboost.so(+0x33b261) [0x7f14b4e17261]
[bt] (5) /home/.../.local/lib/python3.6/site-packages/xgboost/./lib/libxgboost.so(xgboost::HostDeviceVectorImpl<float>::Reshard(xgboost::GPUDistribution const&)+0x1b6) [0x7f14b4e40d26]
[bt] (6) /home/.../.local/lib/python3.6/site-packages/xgboost/./lib/libxgboost.so(xgboost::obj::RegLossObj<xgboost::obj::LinearSquareLoss>::PredTransform(xgboost::HostDeviceVector<float>*)+0xf9) [0x7f14b4e0d239]
[bt] (7) /home/.../.local/lib/python3.6/site-packages/xgboost/./lib/libxgboost.so(XGBoosterPredict+0x107) [0x7f14b4c08be7]
[bt] (8) /usr/lib/x86_64-linux-gnu/libffi.so.6(ffi_call_unix64+0x4c) [0x7f14f3b21dae]
[bt] (9) /usr/lib/x86_64-linux-gnu/libffi.so.6(ffi_call+0x22f) [0x7f14f3b2171f]

It seems that CUDA is somehow involved in this. If that helps, I have CUDA v10.0.130 installed on my machine.

I tried to run it on a machine in the cloud that doesn't have any GPU and it seems to work as intended.

@teopapad92
Copy link

I ran into the same problem recently.

I noticed that if you use an older version of xgboost (0.72.1) the problem of "it hangs and doesn’t do anything" seems to disappear, but the process takes way too long.

Just for comparison I used multi Threading (which is slower than multi processing) on the latest version (0.90).
Results:
-Multi Processing on v.0.72.1: 672 sec
-Multi Threading on v.0.90: 164 sec

@trivialfis
Copy link
Member

Some related thoughts: The nthread is a runtime parameter, so when pickling (what Python do when spawning new process) can not include nthread in the pickle. This can be resolved once #4855 is materialized.

@mayanxin
Copy link

mayanxin commented Sep 27, 2019

I had the same problem when I tried to run it on a machine that has GPUs
image

@owenljn
Copy link

owenljn commented Nov 15, 2019

Any update on this? I have the same issue here

@trivialfis
Copy link
Member

Thanks for reminding. Let's see if I can get to this at the weekend.

@owenljn
Copy link

owenljn commented Nov 15, 2019

I implemented a workaround using ZMQ Load Balancer.

So I cut out the code where XGBoost models are initialized and loaded in my master script, and put the code into an independent python script and implemented a worker routine that uses ZMQ load balancing techniques to serve the XGBoost models in the backend.

Due to system memory limit, I only initiated 4 workers, so 4 independent XGBoost models as backend workers. The frontend is still in the multiprocessing part of the original master script, but instead of utilizing XGBoost models to make predictions directly, the frontend now sends requests to backend XGBoost workers and receive the predictions from backend. Now no more dmlc errors.

Still, it will be awesome if XGBoost eventually make predict() work with multiprocessing
link to ZMQ Load Balancer which inspires my workaround

@owenljn
Copy link

owenljn commented Nov 18, 2019

Hi I implemented a demo which shows how ZMQ load balancer can help with this issue:
Link to the demo

@trivialfis
Copy link
Member

Right now another workaround is don't initialize XGBoost before forking (Like loading pickle only after fork). Maybe we can utilize some low level driver API to maintain the cuda context ourselves, but simply using a distributed framework like dask seems much simpler.

@trivialfis
Copy link
Member

A quick update on this, thread safe prediction/inplace-prediction are now supported.

marco-c added a commit to mozilla/bugbug that referenced this issue Jun 15, 2020
… in parallel with multiple processes"

This reverts commit 34de11c.

We can't run the evaluation right after training otherwise, because
of dmlc/xgboost#4246.
Performing test selection in parallel doesn't buy us that much anyway
as XGBoost already works in parallel (only the generation of the
elements to pass to XGBoost would be parallel).
@hcho3 hcho3 closed this as completed Jun 18, 2020
@colin-zhou
Copy link

hi @trivialfis , did this problem fixed now or not?

@hcho3
Copy link
Collaborator Author

hcho3 commented Dec 21, 2020

@colin-zhou You can now use inplace_predict() for thread-safe prediction.

@sangaline
Copy link

You can now use inplace_predict() for thread-safe prediction.

@trivialfis @hcho3 I'm still experiencing this issue with the latest v1.7.1 release and model.inplace_predict(). When loading a pickled model before forking, any call to XGBClassifier.predict() after forking will hang. The predictor is set to auto on a machine with no GPU or CUDA installed, and model._can_use_inplace_predict() returns True. The hang occurs here:

predts = self.get_booster().inplace_predict(
    data=X,
    iteration_range=iteration_range,
    predict_type="margin" if output_margin else "value",
    missing=self.missing,
    base_margin=base_margin,
    validate_features=validate_features,
)

@trivialfis
Copy link
Member

trivialfis commented Nov 20, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants