-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CI CUDA job #3402
Comments
It is exclusive, feel free to use it. |
I think we can go the following way.
I've made some progress with this in
@guolinke Could you please help to install NVIDIA drivers to the machine? I'm not sure but it might help to automate the process: https://docs.microsoft.com/en-us/azure/virtual-machines/extensions/hpccompute-gpu-linux. |
I think it will be enough to have 1 machine. |
@StrikerRUS these VMs are allocated on-the-fly, I am not sure can we install driver on it or not. |
let us set the max workers to 2, in case for some cocurrence jobs. |
Looks like driver extension didn't help: there is no Also, I found experimental option that allows to not install driver on host machine but use driver containers.
Unfortunately, driver containers also requires rebooting:
So I have no idea how to configure CUDA jobs other than renting normal permanent GPU Azure machine. |
Thanks @StrikerRUS , Maybe we can use self-hosted github action agents. I used it before, which can use an permanent VM for CI jobs. |
you can have try. the driver and docker is installed. Also, I fix the setup-python according to https://github.com/actions/setup-python#linux |
Amazing! Just got it work! Will read more about GitHub Actions self-hosted runners and get back with new proposals in a few days. https://github.com/microsoft/LightGBM/runs/1173237869?check_suite_focus=true |
Hmm, seems that it is possible to use on demand allocation of new VMs with each trigger: Or probably it will be good (at least easier) to have one permanent VM with installed drivers and docker, but turn it on and off automatically with new builds. |
@guolinke I'm afraid we cannot run tests with NVIDIA Tesla M60.
https://en.wikipedia.org/wiki/CUDA#GPUs_supported Line 159 in 79d288a
|
I see. I will change it to p100 or p40 |
Now it is p100 |
Thank you! Is there any similar to AWS G4 machines in Azure? It will probably cost less: |
The only other option is p40, which provides more GPU memories, but slightly slower. The cost is the same. So I choose p100. |
This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this. |
Opening separate issue to discuss enabling CUDA CI job on demand as the original PR with initial CUDA support has 400+ comments. Refer to #3160 (comment).
@guolinke Will
linux-gpu-pool
be used exclusively for LightGBM (CUDA) CI jobs? Or this machine is used for other purposes as well?The text was updated successfully, but these errors were encountered: