-
-
Notifications
You must be signed in to change notification settings - Fork 16.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use multi-threading in cache_labels #3505
Use multi-threading in cache_labels #3505
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👋 Hello @deanmark, thank you for submitting a 🚀 PR! To allow your work to be integrated as seamlessly as possible, we advise you to:
- ✅ Verify your PR is up-to-date with origin/master. If your PR is behind origin/master an automatic GitHub actions rebase may be attempted by including the /rebase command in a comment body, or by running the following code, replacing 'feature' with the name of your local branch:
git remote add upstream https://github.com/ultralytics/yolov5.git
git fetch upstream
git checkout feature # <----- replace 'feature' with local branch name
git rebase upstream/develop
git push -u origin -f
- ✅ Verify all Continuous Integration (CI) checks are passing.
- ✅ Reduce changes to the absolute minimum required for your bug fix or feature addition. "It is not daily increase but daily decrease, hack away the unessential. The closer to the source, the less wastage there is." -Bruce Lee
I just realized someone already uploaded a similar PR. I think my PR still has merit because the implementation is better. I use imap_unordered vs. starmap, which is a better fit to this use case. Additionally, the code is more concise. |
@deanmark we have an existing PR for mutlithreaded label caching in #3385 by @vslaykovsky, though there are a few outstanding problems that we were not able to resolve in that PR:
|
@deanmark one way to test your PR is to run COCO128 after corrupting one of the labels, i.e. add a 6th column to a row and you should see a report to screen identifying the problem image/label pair:
|
@deanmark on the surface of things I think imap_unordered should allow for proper tqdm progress bar and error handling integration that was not possible with starmap, so that's a good sign. Do you have any profiling results before and after? |
@deanmark looks like tqdm pbar output is all good, including corruption display. Profiling results on n1-standard-8 GCP instance: Current resultstrain: Scanning '../coco/train2017' images and labels... 117266 found, 1021 missing, 0 empty, 0 corrupted: 100%|████████████████████████████████████████| 118287/118287 [01:21<00:00, 1443.50it/s]
train: New cache created: ../coco/train2017.cache
val: Scanning '../coco/val2017' images and labels... 4952 found, 48 missing, 0 empty, 0 corrupted: 100%|████████████████████████████████████████████████████| 5000/5000 [00:02<00:00, 1911.45it/s]
val: New cache created: ../coco/val2017.cache PR resultstrain: Scanning '../coco/train2017' images and labels... 117266 found, 1021 missing, 0 empty, 0 corrupted: 100%|█████████████████████████████████████████| 118287/118287 [02:07<00:00, 930.79it/s]
train: New cache created: ../coco/train2017.cache
val: Scanning '../coco/val2017' images and labels... 4952 found, 48 missing, 0 empty, 0 corrupted: 100%|████████████████████████████████████████████████████| 5000/5000 [00:04<00:00, 1035.35it/s]
val: New cache created: ../coco/val2017.cache Hmm, well unfortunately the PR took longer than the current code to cache COCO on our GCP instance. This might actually coincide with the results from my earlier experiments in #3385 (comment) that showed a slowdown vs default with Perhaps until we can get a better solution we can update this PR for normal for-loop operation, still using the refactored function you created to allow for users to do easier multiprocessing experiments going forward. |
The issues you raised work properly in this PR. Provided are some profiling results:
The processor used was a Xeon E5-2690, with the images and labels residing on a network drive. Interestingly, the best results are achieved using 4 threads. I also added several other methods to the comparison: ThreadPool.imap, Pool.map and Pool.imap_unordered. All these methods work more or less the same on my system. |
@glenn-jocher if Pool(8).starmap worked well on your setup, maybe try using Pool.imap/imap_unordered. |
@deanmark hmm interesting. The network is likely your bottleneck then. We always recommend training with local data, not a mounted bucket or network drive. If I repeat with VOC on the same n1-standard-8 instance (datasets are on a 500GB SSD) for VOC I get these: Current VOC (5s)train: Scanning '../VOC/labels/train' images and labels... 16551 found, 0 missing, 0 empty, 0 corrupted: 100%|██████████| 16551/16551 [00:05<00:00, 3264.42it/s]
train: New cache created: ../VOC/labels/train.cache
val: Scanning '../VOC/labels/val' images and labels... 4952 found, 0 missing, 0 empty, 0 corrupted: 100%|█████████████████| 4952/4952 [00:02<00:00, 2331.41it/s]
val: New cache created: ../VOC/labels/val.cache PR VOC ThreadPool(8).imap_unordered (15s)train: Scanning '../VOC/labels/train' images and labels... 16551 found, 0 missing, 0 empty, 0 corrupted: 100%|██████████| 16551/16551 [00:15<00:00, 1046.35it/s]
train: New cache created: ../VOC/labels/train.cache
val: Scanning '../VOC/labels/val' images and labels... 4952 found, 0 missing, 0 empty, 0 corrupted: 100%|█████████████████| 4952/4952 [00:04<00:00, 1201.11it/s]
val: New cache created: ../VOC/labels/val.cache PR VOC Pool(8).imap_unordered (1s) 🚀train: Scanning '../VOC/labels/train' images and labels... 16551 found, 0 missing, 0 empty, 0 corrupted: 100%|███████████| 16551/16551 [00:01<00:00, 9023.54it/s]
train: New cache created: ../VOC/labels/train.cache
val: Scanning '../VOC/labels/val' images and labels... 4952 found, 0 missing, 0 empty, 0 corrupted: 100%|█████████████████| 4952/4952 [00:01<00:00, 4321.09it/s]
val: New cache created: ../VOC/labels/val.cache |
@glenn-jocher I moved the VOC dataset locally, and could recreate your timings.
These results were obtained using the same Xeon E5-2690 processor, but this time the data was local. |
refactor initial desc
@deanmark I've tested your latest updates and the speeds are much improved! Results on VOC now show about 3x speedup vs current default in #3505 (comment) PR is merged. Thank you for your contributions to YOLOv5 🚀 and Vision AI ⭐! |
@glenn-jocher My pleasure! Keep up the good work with this amazing code. |
Minor updates to #3505, inplace accumulation.
Minor updates to #3505, inplace accumulation.
* Use multi threading in cache_labels * PEP8 reformat * Add num_threads * changed ThreadPool.imap_unordered to Pool.imap_unordered * Remove inplace additions * Update datasets.py refactor initial desc Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com> (cherry picked from commit 28bff22)
Minor updates to ultralytics#3505, inplace accumulation. (cherry picked from commit 8d52c1c)
* Use multi threading in cache_labels * PEP8 reformat * Add num_threads * changed ThreadPool.imap_unordered to Pool.imap_unordered * Remove inplace additions * Update datasets.py refactor initial desc Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>
Minor updates to ultralytics#3505, inplace accumulation.
Minor updates to ultralytics/yolov5#3505, inplace accumulation.
Minor updates to ultralytics/yolov5#3505, inplace accumulation.
Use multi-threading in cache_labels function. Saves time when loading large datasets.
🛠️ PR Summary
Made with ❤️ by Ultralytics Actions
🌟 Summary
Improved threading for image processing and label verification.
📊 Key Changes
🎯 Purpose & Impact