-
Notifications
You must be signed in to change notification settings - Fork 530
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(pt): improve OOM detection #4638
Conversation
See deepmodeling#4594 for more details. Signed-off-by: Jinzhe Zeng <jinzhe.zeng@ustc.edu.cn>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR Overview
This PR improves the out-of-memory (OOM) detection logic by extending error checks to cover additional CUDA error messages.
- Adds a new check for the error message "CUDA error: out of memory".
- Updates the conditional structure to include a direct check for torch.cuda.OutOfMemoryError.
Reviewed Changes
File | Description |
---|---|
deepmd/pt/utils/auto_batch_size.py | Enhanced error detection for OOM with an added error check |
Copilot reviewed 1 out of 1 changed files in this pull request and generated no comments.
📝 WalkthroughWalkthroughThe changes update the Changes
Sequence Diagram(s)sequenceDiagram
participant Caller
participant AutoBatchSize
participant CUDA
Caller->>AutoBatchSize: Call is_oom_error(exception)
Note right of AutoBatchSize: Check if exception is a RuntimeError containing "CUDA error: out of memory"\nor an instance of torch.cuda.OutOfMemoryError
alt OOM Error Detected
AutoBatchSize->>CUDA: torch.cuda.empty_cache()
AutoBatchSize-->>Caller: Return True
else Not OOM Error
AutoBatchSize-->>Caller: Return False
end
Possibly related PRs
Suggested labels
Suggested reviewers
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
🔇 Additional comments (1)
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## devel #4638 +/- ##
==========================================
- Coverage 84.78% 84.77% -0.01%
==========================================
Files 688 688
Lines 66091 66090 -1
Branches 3539 3538 -1
==========================================
- Hits 56033 56031 -2
+ Misses 8918 8917 -1
- Partials 1140 1142 +2 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
See #4594 for more details.
Summary by CodeRabbit