-
-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix quantile tests running on multi-gpus #8775
Conversation
@trivialfis @hcho3 I'm surprised these tests are not run in the mgpu buildkite pipeline. Do we need to enable them? |
We run all gtests using a single GPU only. This was done to save CI cost. |
Ah I see. So what does the multi-gpu pipeline do? :) |
@rongou It runs Python tests with |
Can we also run the multi-gpu c++ tests there? For example, we can name the two tests here something like |
@rongou That's one option. Alternatively, we can create a separate gtest binary for the multi-GPU tests. |
The CPP MGPU tests were removed after having dask tests. However, it seems we might need to bring it back as some of them are moved back to CPP from dask with the development of federated learning. |
Opened an issue for tracking #8782 . |
See my latest commit. It will run the quantile tests using 4 GPUs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hcho3 is this a transient error or something wrong with the configuration?
|
@rongou The error is expected. It means that the Docker cache already has the Docker image for this CI pipeline. Can you take a look and find out why the gtest binary is crashing? |
@hcho3 is there any way to get more information from the run? Right now it just says
I can't reproduce this on my local machine with 2 GPUs. |
Is it using NCCL? |
Yes. |
I'll try to debug on my end |
87d06c5
to
4e9e6c5
Compare
As reported in #8710 (comment).
Closes #8782 .