-
Notifications
You must be signed in to change notification settings - Fork 3.6k
[CI] gpu test with pytorch nightly #3543
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Borda
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's write it as jsonnet script because just one parameter changes, right?
Yes, only docker image changes. |
|
I think that then we would need to coordinate, because when you add it, ping me and would need to a temporary change the Drone config file, so lets keep the old one and add this jsonnet as a new file... |
|
@ydcjeff we can move forward... |
|
@Borda btw, are we going to run them one after another or in parallel? |
|
well mi mind wait with this one as we are reaching some computational limits on Drone, sometimes. now we have a queue for about 2hours and with running this one it will be even double... so let's prepare it but wait with merging it... |
|
Yea, I am fine with anyway. I have now setup drone to run one after another in jsonnet. ---
kind: pipeline
type: docker
name: torch-GPU
platform:
os: linux
arch: amd64
steps:
- name: testing
image: pytorchlightning/pytorch_lightning:base-cuda-py3.7-torch1.6
commands:
- export PATH=$PATH:/root/.local/bin
- python --version
- pip install pip -U
- pip --version
- nvidia-smi
- apt-get update && apt-get install -y cmake
- pip install -r ./requirements/base.txt -q --upgrade-strategy only-if-needed
- pip install -r ./requirements/devel.txt -q --upgrade-strategy only-if-needed
- pip install -r ./requirements/examples.txt -q --upgrade-strategy only-if-needed
- pip list
- python -c 'import torch ; print(' & '.join([torch.cuda.get_device_name(i) for i in range(torch.cuda.device_count())]) if torch.cuda.is_available() else 'only CPU')'
- coverage run --source pytorch_lightning -m py.test pytorch_lightning tests -v --durations=25
- python -m py.test benchmarks pl_examples -v --maxfail=2 --durations=0
- coverage report
- codecov --token $CODECOV_TOKEN --flags=gpu,pytest --name='GPU-coverage' --env=linux --build $DRONE_BUILD_NUMBER --commit $DRONE_COMMIT
- python tests/collect_env_details.py
environment:
CODECOV_TOKEN:
from_secret: codecov_token
HOROVOD_GPU_OPERATIONS: NCCL
HOROVOD_WITHOUT_MPI: 1
HOROVOD_WITHOUT_MXNET: 1
HOROVOD_WITHOUT_TENSORFLOW: 1
HOROVOD_WITH_GLOO: 1
HOROVOD_WITH_PYTORCH: 1
MKL_THREADING_LAYER: GNU
SLURM_LOCALID: 0
trigger:
branch:
- master
event:
- push
- pull_request
---
kind: pipeline
type: docker
name: torch-GPU-nightly
platform:
os: linux
arch: amd64
steps:
- name: testing
image: pytorchlightning/pytorch_lightning:base-cuda-py3.7-torch1.7
commands:
- export PATH=$PATH:/root/.local/bin
- python --version
- pip install pip -U
- pip --version
- nvidia-smi
- apt-get update && apt-get install -y cmake
- pip install -r ./requirements/base.txt -q --upgrade-strategy only-if-needed
- pip install -r ./requirements/devel.txt -q --upgrade-strategy only-if-needed
- pip install -r ./requirements/examples.txt -q --upgrade-strategy only-if-needed
- pip list
- python -c 'import torch ; print(' & '.join([torch.cuda.get_device_name(i) for i in range(torch.cuda.device_count())]) if torch.cuda.is_available() else 'only CPU')'
- coverage run --source pytorch_lightning -m py.test pytorch_lightning tests -v --durations=25
- python -m py.test benchmarks pl_examples -v --maxfail=2 --durations=0
- coverage report
- codecov --token $CODECOV_TOKEN --flags=gpu,pytest --name='GPU-coverage' --env=linux --build $DRONE_BUILD_NUMBER --commit $DRONE_COMMIT
- python tests/collect_env_details.py
environment:
CODECOV_TOKEN:
from_secret: codecov_token
HOROVOD_GPU_OPERATIONS: NCCL
HOROVOD_WITHOUT_MPI: 1
HOROVOD_WITHOUT_MXNET: 1
HOROVOD_WITHOUT_TENSORFLOW: 1
HOROVOD_WITH_GLOO: 1
HOROVOD_WITH_PYTORCH: 1
MKL_THREADING_LAYER: GNU
SLURM_LOCALID: 0
trigger:
branch:
- master
event:
- push
- pull_request
depends_on:
- torch-GPU
...
|
@Borda regarding with testing nightly, would it be better to start support PT nightly version ahead of one month before stable launch? For now, there were already 2 PRs for PyTorch nightly, so started thinking about that... What do you think? |
Codecov Report
@@ Coverage Diff @@
## master #3543 +/- ##
======================================
Coverage 93% 93%
======================================
Files 118 118
Lines 9018 9018
======================================
Hits 8389 8389
Misses 629 629 |
|
hey @ydcjeff hows it going? any way we can help? |
|
@edenlightning we can merge it as it now, but to activate it (switch source in Drone config) we need to get scalable Drone testing running because this basically double the number of performed tests and already now we are full... |
|
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. If you need further help see our docs: https://pytorch-lightning.readthedocs.io/en/latest/CONTRIBUTING.html#pull-request or ask the assistance of a core contributor here or on Slack. Thank you for your contributions. |
Borda
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update after #3658
|
This is not tested, but still, I would merge it as using it as a baseline and debug it later... |
|
Okay 👌 |
|
I messed up with my forked repo's master and re-forked again. |
|
@ydcjeff seems this is not valid as you lost the original fork, mind create it again and refer to this already approved PR 🐰 |
What does this PR do?
Fixes #2090
Creates
.drone.jsonnetfor multiple testingDO NOT MERGE THIS UNTIL WE SCALE UP DRONE