Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Weighted Sampler for highly imbalanced datasets #8766

Closed
wants to merge 48 commits into from

Conversation

pourmand1376
Copy link
Contributor

@pourmand1376 pourmand1376 commented Jul 28, 2022

Related Issues:

This PR adds weighted sampler for datasets with highly imbalanced data. Idea is taken from here.

As you know in medical images, the data is mostly highly imbalanced and there is nothing we can do to increase data. If you train yolo using default sampler, you would get 0,0,0 for Precision, Recall and mAP. This is why this is a must for certain datasets.

🛠️ PR Summary

Made with ❤️ by Ultralytics Actions

🌟 Summary

Added Weighted Sampler option for handling imbalanced datasets in YOLOv5 training.

📊 Key Changes

  • Integrated Weighted Random Sampler within the training process.
  • Added weighted_sampler as a boolean argument in the training script to enable the use of the sampler.
  • Created a utility function to initialize the Weighted Random Sampler based on label distribution.
  • Ensured the Weighted Sampler is not used during validation for accurate results.

🎯 Purpose & Impact

  • 🎯 Purpose: To improve model performance on imbalanced datasets where some classes appear much more frequently than others.
  • 💡 Benefits:
    • Helps to prevent model bias towards more frequent classes.
    • Aimed at boosting the accuracy for rare classes.
  • ⚠️ Potential Impact:
    • Users training models on imbalanced datasets may see better performance.
    • The Weighted Sampler feature is not compatible with multi-GPU training setups at the moment.

@pourmand1376 pourmand1376 changed the title Add Weighted Sampler Add Weighted Sampler for highly imbalanced datasets Jul 28, 2022
@AyushExel
Copy link
Contributor

@pourmand1376 hey this is a great idea. I've seen this being used for other tasks like segmentation. Do you have a before/after study of the effect of this change on any dataset?

@pourmand1376
Copy link
Contributor Author

@AyushExel
I am working on a paper which studies this effect on a custom dataset. In the meantime, I will do an analysis on a public dataset and report here. Stay tuned.

@glenn-jocher
Copy link
Member

@pourmand1376 very cool! Does this overlap or complement the train.py --image-weights argument which also introduces weighted sampling based on image contents and the previous epoch's per-class validation results?

yolov5/train.py

Line 466 in e309a85

parser.add_argument('--image-weights', action='store_true', help='use weighted image selection for training')

@pourmand1376
Copy link
Contributor Author

pourmand1376 commented Jul 31, 2022

@glenn-jocher, I found no documentation for this option. I do not know what it does. Do we have any documentation about how to test it?

From the code, it seems that it does nothing serious. Am I right?

@triple-Mu
Copy link
Contributor

I test in coco128, there are something wrong in unique_classes, counts = np.unique(labels_per_class, return_counts=True)

@pourmand1376
Copy link
Contributor Author

@triple-Mu Thanks for testing!

Can you explain more? How did you understand something is wrong?
Do you have any errors?

@pourmand1376
Copy link
Contributor Author

I found this public dataset to test my imbalanced sampling strategy. Although I have tested this on my custom dataset, testing should be done on public datasets to make it reproducible.

I will do an analysis soon.

@pourmand1376 pourmand1376 marked this pull request as ready for review August 30, 2022 09:52
@github-actions
Copy link
Contributor

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions YOLOv5 🚀 and Vision AI ⭐.

@github-actions github-actions bot added the Stale Stale and schedule for closing soon label Mar 22, 2023
@github-actions github-actions bot removed the Stale Stale and schedule for closing soon label Apr 10, 2023
@github-actions
Copy link
Contributor

github-actions bot commented Oct 3, 2023

👋 Hello there! We wanted to let you know that we've decided to close this pull request due to inactivity. We appreciate the effort you put into contributing to our project, but unfortunately, not all contributions are suitable or aligned with our product roadmap.

We hope you understand our decision, and please don't let it discourage you from contributing to open source projects in the future. We value all of our community members and their contributions, and we encourage you to keep exploring new projects and ways to get involved.

For additional resources and information, please see the links below:

Thank you for your contributions to YOLO 🚀 and Vision AI ⭐

@github-actions github-actions bot added the Stale Stale and schedule for closing soon label Oct 3, 2023
@github-actions github-actions bot closed this Nov 3, 2023
@glenn-jocher
Copy link
Member

You're welcome, @triple-Mu! If you encounter any more issues or have further questions, feel free to reach out. Happy training! 😊🚀

label_classes = np.unique(label[:, 0]).tolist()
values = []
for cls_ in label_classes:
values.append(weight_dict[_cls])
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here shouldn't it be [cls_]? I assume this is a typo. Maybe worth noticing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. Well noticed.

But this is not merged and it is not going to be since yolov5 is not maintained that much anymore.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Stale Stale and schedule for closing soon
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants