-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speedup loading large yolo datasets. #26
Conversation
Codecov Report
@@ Coverage Diff @@
## main #26 +/- ##
==========================================
+ Coverage 74.98% 75.28% +0.29%
==========================================
Files 18 18
Lines 1659 1679 +20
==========================================
+ Hits 1244 1264 +20
Misses 415 415
Continue to review full report at Codecov.
|
docs build is failing because of jinja2 issue, it goes away after updating Update: Fixed it by updating the versions in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR. Looks good to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the PR.
Currently, loading yolo dataset(~150k) is taking prohibitively long time compared to loading the same dataset in coco format.
I think it is mostly because of yolo annotation format where we don't have actual image width and heights. So we are trying to get the height and width of the images using
imagesize.get
but required while converting to other formats.Nothing much, just parallelized for loop using
joblib.Parallel
using half the available threads by default. Also updated copyright year in the docs.Speed test on ~150k dataset:
before
joblib.Parallel
: >180 minsafter
joblib.Parallel
: ~12 mins