-
-
Notifications
You must be signed in to change notification settings - Fork 16.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Only use annotated images to create dataset in autosplit() #2331
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sync with master branch
* EMA bug fix 2 * update
* Resume with custom anchors fix * Update train.py
* faster random index generator for mosaic augementation We don't need to access list to generate random index It makes augmentation slower. * Update datasets.py Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>
Set HOME environment variable per Binder requirements. https://github.com/binder-examples/minimal-dockerfile
…augmentation (#2383) image weights compatible faster random index generator v2 for mosaic augmentation
* option for skip last layer and cuda export support * added parameter device * fix import * cleanup 1 * cleanup 2 * opt-in grid --grid will export with grid computation, default export will skip grid (same as current) * default --device cpu GPU export causes ONNX and CoreML errors. Co-authored-by: Jan Hajek <jan.hajek@gmail.com> Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>
* GCP sudo docker * cleanup
* added argoverse-download ability * bugfix * add support for Argoverse dataset * Refactored code * renamed to argoverse-HD * unzip -q and YOLOv5 small cleanup items * add image counts Co-authored-by: Kartikeya Sharma <kartikes@trinity.vision.cs.cmu.edu> Co-authored-by: Kartikeya Sharma <kartikes@trinity-0-32.eth> Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>
* Integer printout * test.py 'Labels' * Update train.py
* Update test.py --task train val study * update argparser --task
* labels.png class names * fontsize=10
curl preferred over wget for slightly better cross platform compatibility (i.e. out of the box macos compatible).
* Add autoShape() speed profiling * Update common.py * Create README.md * Update hubconf.py * cleanuip
kinoute
changed the title
Add background images and annotated only features to autosplit()
Only use annotated images to create dataset in autosplit()
Mar 14, 2021
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
After I read your message to get the best results from Yolov5 in #2313 (comment) where you talk about background images and also after our discussion regarding
autosplit()
and annotated files in #2228, I decided to add two functionalities inautosplit()
:The first one is the ability to create a dataset/splits only with images that have an annotation file, i.e a .txt file, associated to it. As we talked about this, the absence of a txt file could mean two things:
When it's easy to create small datasets, when you have to create datasets with thousands of images (and more coming), it's hard to track where you at and you don't want to wait to have all of them annotated before starting to train. Which means some images would lack txt files and annotations, resulting in label inconsistency as you say in Hunt for the highest MAP #2313. By adding the
annotated_only
argument to the function, people could create, if they want to, datasets/splits only with images that were labelled, for sure.The second functionality is the ability to fill our dataset with background images automatically. Basically, we provide a path in the
bg_imgs_path
argument and some images from this path/folder will be picked and added to the splits. The number, or more precisely, the ratio of background images can be configured with thebg_imgs_ratio
. I followed your advices on the other issue and limited it to a float between 0 and 0.1 (0% and 10%). This ratio will be used to calculate how many background images should be added in each split. If my training split has 1000 images and I set upbg_imgs_ratio
to 0.1, then, if I have enough background images to fill the demand, around 100 background images will be added to the training split.🛠️ PR Summary
Made with ❤️ by Ultralytics Actions
🌟 Summary
Updated YOLOv5 Dockerfile, README, and various performance improvements.
📊 Key Changes
HOME
set for Docker.🎯 Purpose & Impact