Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Call _cuda_clearCublasWorkspaces on teardown #16907

Merged
merged 4 commits into from
Mar 6, 2023
Merged

Conversation

carmocca
Copy link
Contributor

@carmocca carmocca commented Feb 28, 2023

What does this PR do?

Fixes issue introduced on PyTorch 2.0

Closes pytorch/pytorch#95668

cc @Borda @carmocca @justusschock @awaelchli

@carmocca carmocca added bug Something isn't working accelerator: cuda Compute Unified Device Architecture GPU torch.compile labels Feb 28, 2023
@carmocca carmocca added this to the 2.0 milestone Feb 28, 2023
@carmocca carmocca self-assigned this Feb 28, 2023
@github-actions github-actions bot added fabric lightning.fabric.Fabric pl Generic label for PyTorch Lightning package labels Feb 28, 2023
@carmocca carmocca marked this pull request as ready for review March 1, 2023 02:14
@github-actions
Copy link
Contributor

github-actions bot commented Mar 1, 2023

⚡ Required checks status: All passing 🟢

Groups summary

🟢 pytorch_lightning: Tests workflow
Check ID Status
pl-cpu (macOS-11, lightning, 3.8, 1.11) success
pl-cpu (macOS-11, lightning, 3.9, 1.12) success
pl-cpu (macOS-11, lightning, 3.10, 1.13) success
pl-cpu (macOS-11, lightning, 3.8, 1.11, oldest) success
pl-cpu (macOS-11, lightning, 3.9, 2.0, pre) success
pl-cpu (ubuntu-20.04, lightning, 3.9, 1.11) success
pl-cpu (ubuntu-20.04, lightning, 3.10, 1.12) success
pl-cpu (ubuntu-20.04, lightning, 3.10, 1.13) success
pl-cpu (ubuntu-20.04, lightning, 3.8, 1.11, oldest) success
pl-cpu (ubuntu-20.04, lightning, 3.9, 2.0, pre) success
pl-cpu (windows-2022, lightning, 3.9, 1.11) success
pl-cpu (windows-2022, lightning, 3.10, 1.12) success
pl-cpu (windows-2022, lightning, 3.10, 1.13) success
pl-cpu (windows-2022, lightning, 3.8, 1.11, oldest) success
pl-cpu (windows-2022, lightning, 3.9, 2.0, pre) success
pl-cpu (macOS-11, pytorch, 3.8, 1.13) success
pl-cpu (ubuntu-20.04, pytorch, 3.8, 1.13) success
pl-cpu (windows-2022, pytorch, 3.8, 1.13) success

These checks are required after the changes to src/lightning/fabric/accelerators/cuda.py, src/lightning/pytorch/accelerators/cuda.py, tests/tests_pytorch/loops/test_evaluation_loop.py, tests/tests_pytorch/strategies/test_ddp.py, tests/tests_pytorch/strategies/test_ddp_strategy.py.

🟢 pytorch_lightning: Azure GPU
Check ID Status
pytorch-lightning (GPUs) success

These checks are required after the changes to src/lightning/pytorch/accelerators/cuda.py, tests/tests_pytorch/loops/test_evaluation_loop.py, tests/tests_pytorch/strategies/test_ddp.py, tests/tests_pytorch/strategies/test_ddp_strategy.py, src/lightning/fabric/accelerators/cuda.py.

🟢 pytorch_lightning: Azure HPU
Check ID Status
pytorch-lightning (HPUs) success

These checks are required after the changes to src/lightning/fabric/accelerators/cuda.py, src/lightning/pytorch/accelerators/cuda.py, tests/tests_pytorch/loops/test_evaluation_loop.py, tests/tests_pytorch/strategies/test_ddp.py, tests/tests_pytorch/strategies/test_ddp_strategy.py.

🟢 fabric: Docs
Check ID Status
make-doctest (fabric) success
make-html (fabric) success

These checks are required after the changes to src/lightning/fabric/accelerators/cuda.py.

🟢 pytorch_lightning: Docs
Check ID Status
make-doctest (pytorch) success
make-html (pytorch) success

These checks are required after the changes to src/lightning/pytorch/accelerators/cuda.py.

🟢 lightning_fabric: CPU workflow
Check ID Status
fabric-cpu (macOS-11, lightning, 3.8, 1.11) success
fabric-cpu (macOS-11, lightning, 3.9, 1.12) success
fabric-cpu (macOS-11, lightning, 3.10, 1.13) success
fabric-cpu (macOS-11, lightning, 3.8, 1.11, oldest) success
fabric-cpu (macOS-11, lightning, 3.9, 2.0, pre) success
fabric-cpu (ubuntu-20.04, lightning, 3.9, 1.11) success
fabric-cpu (ubuntu-20.04, lightning, 3.10, 1.12) success
fabric-cpu (ubuntu-20.04, lightning, 3.10, 1.13) success
fabric-cpu (ubuntu-20.04, lightning, 3.8, 1.11, oldest) success
fabric-cpu (ubuntu-20.04, lightning, 3.9, 2.0, pre) success
fabric-cpu (windows-2022, lightning, 3.9, 1.11) success
fabric-cpu (windows-2022, lightning, 3.10, 1.12) success
fabric-cpu (windows-2022, lightning, 3.10, 1.13) success
fabric-cpu (windows-2022, lightning, 3.8, 1.11, oldest) success
fabric-cpu (windows-2022, lightning, 3.9, 2.0, pre) success
fabric-cpu (macOS-11, fabric, 3.8, 1.13) success
fabric-cpu (ubuntu-20.04, fabric, 3.8, 1.13) success
fabric-cpu (windows-2022, fabric, 3.8, 1.13) success

These checks are required after the changes to src/lightning/fabric/accelerators/cuda.py.

🟢 lightning_fabric: Azure GPU
Check ID Status
lightning-fabric (GPUs) success

These checks are required after the changes to src/lightning/fabric/accelerators/cuda.py.

🟢 mypy
Check ID Status
mypy success

These checks are required after the changes to src/lightning/fabric/accelerators/cuda.py, src/lightning/pytorch/accelerators/cuda.py.

🟢 install
Check ID Status
install-pkg (ubuntu-22.04, app, 3.8) success
install-pkg (ubuntu-22.04, app, 3.10) success
install-pkg (ubuntu-22.04, fabric, 3.8) success
install-pkg (ubuntu-22.04, fabric, 3.10) success
install-pkg (ubuntu-22.04, pytorch, 3.8) success
install-pkg (ubuntu-22.04, pytorch, 3.10) success
install-pkg (ubuntu-22.04, lightning, 3.8) success
install-pkg (ubuntu-22.04, lightning, 3.10) success
install-pkg (ubuntu-22.04, notset, 3.8) success
install-pkg (ubuntu-22.04, notset, 3.10) success
install-pkg (macOS-12, app, 3.8) success
install-pkg (macOS-12, app, 3.10) success
install-pkg (macOS-12, fabric, 3.8) success
install-pkg (macOS-12, fabric, 3.10) success
install-pkg (macOS-12, pytorch, 3.8) success
install-pkg (macOS-12, pytorch, 3.10) success
install-pkg (macOS-12, lightning, 3.8) success
install-pkg (macOS-12, lightning, 3.10) success
install-pkg (macOS-12, notset, 3.8) success
install-pkg (macOS-12, notset, 3.10) success
install-pkg (windows-2022, app, 3.8) success
install-pkg (windows-2022, app, 3.10) success
install-pkg (windows-2022, fabric, 3.8) success
install-pkg (windows-2022, fabric, 3.10) success
install-pkg (windows-2022, pytorch, 3.8) success
install-pkg (windows-2022, pytorch, 3.10) success
install-pkg (windows-2022, lightning, 3.8) success
install-pkg (windows-2022, lightning, 3.10) success
install-pkg (windows-2022, notset, 3.8) success
install-pkg (windows-2022, notset, 3.10) success

These checks are required after the changes to src/lightning/fabric/accelerators/cuda.py, src/lightning/pytorch/accelerators/cuda.py.


Thank you for your contribution! 💜

Note
This comment is automatically generated and updates for 60 minutes every 180 seconds. If you have any other questions, contact carmocca for help.

@codecov
Copy link

codecov bot commented Mar 1, 2023

Codecov Report

Merging #16907 (f4f4611) into master (54147e0) will decrease coverage by 22%.
The diff coverage is 38%.

Additional details and impacted files
@@            Coverage Diff            @@
##           master   #16907     +/-   ##
=========================================
- Coverage      82%      59%    -22%     
=========================================
  Files         439      414     -25     
  Lines       31688    31392    -296     
=========================================
- Hits        25917    18632   -7285     
- Misses       5771    12760   +6989     

@mergify mergify bot added the ready PRs ready to be merged label Mar 6, 2023
@carmocca carmocca merged commit 4eab1f3 into master Mar 6, 2023
@carmocca carmocca deleted the 2.0/cublas-memory branch March 6, 2023 16:52
Borda pushed a commit that referenced this pull request Mar 30, 2023
lantiga pushed a commit that referenced this pull request Apr 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accelerator: cuda Compute Unified Device Architecture GPU bug Something isn't working fabric lightning.fabric.Fabric pl Generic label for PyTorch Lightning package ready PRs ready to be merged torch.compile
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Nightly is leaking memory with a simple Linear layer
3 participants