Pass `LOCAL_RANK` to `torch_distributed_zero_first()` #5114

qiningonline · 2021-10-10T00:39:37Z

resolve #5111

The change proposed in this pull request:

passing local_rank to torch_distributed_zero_first, to avoid CUDA device index error

Details of the issue are listed in the issue description and follow-up comment.

🛠️ PR Summary

_{Made with ❤️ by Ultralytics Actions}

🌟 Summary

Enhancement of multi-GPU training support in YOLOv5.

📊 Key Changes

Replaced RANK with LOCAL_RANK to better support distributed training on multiple GPUs.

🎯 Purpose & Impact

🤝 Improved Compatibility: These changes enhance the compatibility with distributed training frameworks, allowing for more efficient utilization of multiple GPUs.
⚡ Increased Efficiency: Better support for local GPU rank handling potentially leads to improved parallel processing performance and reduced training times.
👨‍💻 Developer Experience: For developers using YOLOv5 for training on multiple GPUs, this update simplifies processes and could help avoid common issues related to device ranking in distributed environments.

github-actions

👋 Hello @qiningonline, thank you for submitting a 🚀 PR! To allow your work to be integrated as seamlessly as possible, we advise you to:

✅ Verify your PR is up-to-date with origin/master. If your PR is behind origin/master an automatic GitHub actions rebase may be attempted by including the /rebase command in a comment body, or by running the following code, replacing 'feature' with the name of your local branch:

git remote add upstream https://github.com/ultralytics/yolov5.git
git fetch upstream
git checkout feature  # <----- replace 'feature' with local branch name
git merge upstream/master
git push -u origin -f

✅ Verify all Continuous Integration (CI) checks are passing.
✅ Reduce changes to the absolute minimum required for your bug fix or feature addition. "It is not daily increase but daily decrease, hack away the unessential. The closer to the source, the less wastage there is." -Bruce Lee

glenn-jocher · 2021-10-10T01:42:02Z

@qiningonline PR is merged. Thank you for your contributions to YOLOv5 🚀 and Vision AI ⭐

Co-authored-by: qiningonline <qiningonline@gmail.com>

ultralytics#5111, pass local rank to torch_distributed_zero_first.

4206414

github-actions bot reviewed Oct 10, 2021

View reviewed changes

qiningonline mentioned this pull request Oct 10, 2021

CUDA device index error in distributed training #5111

Closed

glenn-jocher changed the title ~~#5111, pass local rank to torch_distributed_zero_first.~~ Pass LOCAL_RANK to torch_distributed_zero_first() Oct 10, 2021

glenn-jocher assigned qiningonline Oct 10, 2021

glenn-jocher merged commit 4a6dfff into ultralytics:master Oct 10, 2021

glenn-jocher mentioned this pull request Oct 12, 2021

YOLOv5 release v6.0 #5141

Merged

glenn-jocher mentioned this pull request Nov 7, 2021

YOLOv5 v6.0 compatibility update (draft) ultralytics/yolov3#1855

Closed

glenn-jocher mentioned this pull request Nov 14, 2021

YOLOv5 v6.0 compatibility update ultralytics/yolov3#1857

Merged

BjarneKuehl pushed a commit to fhkiel-mlaip/yolov5 that referenced this pull request Aug 26, 2022

Pass LOCAL_RANK to torch_distributed_zero_first() (ultralytics#5114)

8fc4641

Co-authored-by: qiningonline <qiningonline@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pass `LOCAL_RANK` to `torch_distributed_zero_first()` #5114

Pass `LOCAL_RANK` to `torch_distributed_zero_first()` #5114

qiningonline commented Oct 10, 2021 •

edited by UltralyticsAssistant

Loading

github-actions bot left a comment

glenn-jocher commented Oct 10, 2021

Pass LOCAL_RANK to torch_distributed_zero_first() #5114

Pass LOCAL_RANK to torch_distributed_zero_first() #5114

Conversation

qiningonline commented Oct 10, 2021 • edited by UltralyticsAssistant Loading

🛠️ PR Summary

🌟 Summary

📊 Key Changes

🎯 Purpose & Impact

github-actions bot left a comment

Choose a reason for hiding this comment

glenn-jocher commented Oct 10, 2021

Pass `LOCAL_RANK` to `torch_distributed_zero_first()` #5114

Pass `LOCAL_RANK` to `torch_distributed_zero_first()` #5114

qiningonline commented Oct 10, 2021 •

edited by UltralyticsAssistant

Loading