-
-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Temporary] Multi-GPU predictor #3819
Conversation
- GPUPredictor is multi-GPU - removed DeviceMatrix, as it has been made obsolete by using HostDeviceVector in DMatrix
@trivialfis FYI, now that you are a committer, you have ability to interrupt any CI jobs as well. This may be helpful when you run into executor starvation on Jenkins. |
@hcho3 I looked around the Jenkins interface, only relevant button I saw is "Restart Jenkins: Build & Test". Which, doens't do anything. |
@trivialfis Look for the square stop icon on the top right corner: |
Also, current setup involves 3 re-tries, so you may have to press the stop button multiple times. |
@trivialfis Ugh, I thought all committers had admin rights on Jenkins, but it doesn't seem to be the case. Let me fix this. |
@hcho3 Thanks, that will be a great help! |
@trivialfis Looks like I had to manually add you to the admin list. Try it again. |
Actually, with a recent Jenkins update, the Stop button now properly works with the retry block. Pressing it once should stop all jobs. |
@hcho3 I tried it again, it's "HTTP ERROR 404" now. |
Can you access the page https://xgboost-ci.net/blue/organizations/jenkins/xgboost/detail/PR-3819/4/pipeline/ ? |
@hcho3 Yes, this one works. |
@hcho3 But still no access permission (the stop button isn't there). |
Try stopping it and see if it works. |
@hcho3 Oh, after a refresh now I see it. It works. Thanks! :) |
Good to hear that. You are now able to stop and re-start any CI jobs. Now let's see if we can figure out why multi-GPU test is failing. |
@hcho3 I'm able to stop it, but not restart it. You can see from the log a java exception "java.lang.NullPointerException: Cannot invoke method getBuildName() on null object Thanks a lot! Let me try to figure something out. |
@trivialfis That's curious. For now, let's focus on the multi-GPU test. Jenkins setup is relatively new, and it can be improved in the future. |
@trivialfis FYI, there are two buttons for restart. Make sure to use the one on the top (without text), not the one at bottom with text. I don't know why, but the bottom one never worked for me. |
8b23040
to
48a3fae
Compare
@hcho3 Got it. |
@hcho3 Can appvayor be cancelled? |
@trivialfis Yes. Do you see this button on the top right? |
In the long run, I'm looking to migrate all Windows tests to Jenkins. You can run many tests in parallel, and each worker can be customized. |
@hcho3 After logging out and log back, I can cancel the appvayor now. :) That's a good plan, I am also thinking about if we can reduce or combine some expensive tests. Will discuss about it later on. |
@hcho3 Last commit in this PR solved the problem. I will push it to original PR tomorrow. Thanks for all the help! |
@trivialfis No problem. BTW, does your fix work when |
@hcho3 Good point. Let me keep looking. |
@trivialfis I found out that Jenkins was showing 404 for anonymous users. I fixed the configuration to address this. |
@hcho3 Glad to hear that. :) |
Please ignore this PR. I'm attempting to resolve the bug in #3738 . Creating a temporary PR allows me to peek at Jenkins without polluting @canonizer 's branch.