-
-
Notifications
You must be signed in to change notification settings - Fork 16.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Weighted Sampler for highly imbalanced datasets #8766
Conversation
for more information, see https://pre-commit.ci
@pourmand1376 hey this is a great idea. I've seen this being used for other tasks like segmentation. Do you have a before/after study of the effect of this change on any dataset? |
@AyushExel |
@pourmand1376 very cool! Does this overlap or complement the train.py --image-weights argument which also introduces weighted sampling based on image contents and the previous epoch's per-class validation results? Line 466 in e309a85
|
@glenn-jocher, I found no documentation for this option. I do not know what it does. Do we have any documentation about how to test it? From the code, it seems that it does nothing serious. Am I right? |
cd14353
to
9929f04
Compare
I test in coco128, there are something wrong in |
@triple-Mu Thanks for testing! Can you explain more? How did you understand something is wrong? |
I found this public dataset to test my imbalanced sampling strategy. Although I have tested this on my custom dataset, testing should be done on public datasets to make it reproducible. I will do an analysis soon. |
for more information, see https://pre-commit.ci
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions YOLOv5 🚀 and Vision AI ⭐. |
Signed-off-by: Amir Pourmand <pourmand1376@gmail.com>
for more information, see https://pre-commit.ci
👋 Hello there! We wanted to let you know that we've decided to close this pull request due to inactivity. We appreciate the effort you put into contributing to our project, but unfortunately, not all contributions are suitable or aligned with our product roadmap. We hope you understand our decision, and please don't let it discourage you from contributing to open source projects in the future. We value all of our community members and their contributions, and we encourage you to keep exploring new projects and ways to get involved. For additional resources and information, please see the links below:
Thank you for your contributions to YOLO 🚀 and Vision AI ⭐ |
You're welcome, @triple-Mu! If you encounter any more issues or have further questions, feel free to reach out. Happy training! 😊🚀 |
label_classes = np.unique(label[:, 0]).tolist() | ||
values = [] | ||
for cls_ in label_classes: | ||
values.append(weight_dict[_cls]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here shouldn't it be [cls_]? I assume this is a typo. Maybe worth noticing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. Well noticed.
But this is not merged and it is not going to be since yolov5 is not maintained that much anymore.
Related Issues:
This PR adds
weighted sampler
for datasets with highly imbalanced data. Idea is taken from here.As you know in medical images, the data is mostly highly imbalanced and there is nothing we can do to increase data. If you train yolo using default sampler, you would get 0,0,0 for Precision, Recall and mAP. This is why this is a must for certain datasets.
🛠️ PR Summary
Made with ❤️ by Ultralytics Actions
🌟 Summary
Added Weighted Sampler option for handling imbalanced datasets in YOLOv5 training.
📊 Key Changes
weighted_sampler
as a boolean argument in the training script to enable the use of the sampler.🎯 Purpose & Impact