-
-
Notifications
You must be signed in to change notification settings - Fork 16.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Objects365 Dataset AutoDownload #2932
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👋 Hello @ferdinandl007, thank you for submitting a 🚀 PR! To allow your work to be integrated as seamlessly as possible, we advise you to:
- ✅ Verify your PR is up-to-date with origin/master. If your PR is behind origin/master an automatic GitHub actions rebase may be attempted by including the /rebase command in a comment body, or by running the following code, replacing 'feature' with the name of your local branch:
git remote add upstream https://github.com/ultralytics/yolov5.git
git fetch upstream
git checkout feature # <----- replace 'feature' with local branch name
git rebase upstream/master
git push -u origin -f
- ✅ Verify all Continuous Integration (CI) checks are passing.
- ✅ Reduce changes to the absolute minimum required for your bug fix or feature addition. "It is not daily increase but daily decrease, hack away the unessential. The closer to the source, the less wastage there is." -Bruce Lee
@ferdinandl007 I've reviewed the files and tried to clean them up a bit. I adjusted to a 120 character line and removed the val set separation as we have our own autosplit() function for splitting datasets into train/val/test splits here (it seems like Objects365 does not have pre-split train and val sections?) Lines 1044 to 1064 in 33712d6
Also PyCharm picked up a few typos in the class names. Are you sure you copied these class names correctly? |
/rebase |
@ferdinandl007 I tried to correct the spelling where I was certain of the correct word in 3220e04. One class was 'campel', which I assumed was 'camel'. Can you check my spelling update to verify these class names are correct? |
@glenn-jocher Yes, that's correct; they are no discreetly separated test and Val sets for object 356 in terms of annotation files. All annotations are stored in zhiyuan_objv2_train.json. Awesome, that splitting function looks a lot better than what I had. Also, the simplification of the code looks great. |
The spelling updates look promising. Just double-check them on Grammarly, but I think you fix them all. |
@ferdinandl007 merging PR. Hopefully this should help people get started faster with Objects365 in the future, even if they have to download the dataset themselves. Thank you for your contributions! |
@ferdinandl007 looking at the Objects365 download site https://open.baai.ac.cn/data-set-detail/MTI2NDc=/MTA=/true, I'm a little confused. There are 3 groups of images that say test, validate, train in Chinese. They each have 50 patches of images, but only train has a json annotations file. Does the single train json have labels for the val images? Where do you get your 5570 images for val? Are these all from labels in sample_2020.json.tar.gz? |
@ferdinandl007 I've updated objects365.yaml, it now pulls 1742289 images (1.7M) from the 50 train patches and creates 1742292 labels from zhiyuan_objv2_train.json. All good more or less, but I'm confused about where you got your 5570 val images and labels. Could you explain that part? Thanks! |
@glenn-jocher Yes all annotations are included in the zhiyuan_objv2_train.json file. In terms of validation images I just took some out of the training said and move them into the validation set. and By the way have you already started training on it and what accuracy did you get? |
@ferdinandl007 hmm interesting. No I haven't started training because I wasn't sure how to handle the val set. When scanning the labels it looks like some classes are very uncommon, i.e. only 200 instances in the dataset, so we would want the val set to guarantee at least 1 instance per class somehow. I was thinking a 1%/99% val/train split might be a good idea with this dataet, for 17k val images. We really want an official val set I think to make our results comparable too, otherwise results will vary by val set and we won't be able to create apples to apples comparisons. |
* add object365 * ADD CONVERSION SCRIPT * fix transcript * Reformat and simplify * spelling * Update get_objects365.py Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>
* add object365 * ADD CONVERSION SCRIPT * fix transcript * Reformat and simplify * spelling * Update get_objects365.py Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>
now val dataset is avaliable on open.baai.ac.cn named zhiyuan_objv2_val.json , by wget "https://dorc.ks3-cn-beijing.ksyun.com/data-set/2020Objects365%E6%95%B0%E6%8D%AE%E9%9B%86/val/zhiyuan_objv2_val.json" |
* add object365 * ADD CONVERSION SCRIPT * fix transcript * Reformat and simplify * spelling * Update get_objects365.py Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com> (cherry picked from commit dbce1bc)
As @tucker666 mentioned, the official |
@farleylai hi, thank you for your feature suggestion on how to improve YOLOv5 🚀! The fastest and easiest way to incorporate your ideas into the official codebase is to submit a Pull Request (PR) implementing your idea, and if applicable providing before and after profiling/inference/training results to help us understand the improvement your feature provides. This allows us to directly see the changes in the code and to understand how they affect workflows and performance. Please see our ✅ Contributing Guide to get started. |
@glenn-jocher, there are indeed validation images in v1 and v2. The paths are not as straightforward as the training set. Once registered with WeChat, the direct download paths will be revealed. I will manage to submit a PR after the download is complete. |
@farleylai ah, that would be great! Yes, thank you for your help :) |
Just submitted as #5194 |
From #5194 (comment): I trained a YOLOv5m model on Objects365 following this PR and the other related fixes. Everything works well. mAP@0.5:0.95 was only 18.5 after 30 epochs, but person mAP was similar to COCO, about 55 mAP@0.5:0.95. I'm sure this could also be improved with more epochs and additional tweaks, but at first glance all is good here. DDP train command:
Results # YOLOv5m v6.0 COCO 300 epochs
all 5000 36335 0.726 0.569 0.633 0.439
person 5000 10777 0.792 0.735 0.807 0.554
# YOLOv5m v6.0 Objects365 30 epochs
all 80000 1239576 0.626 0.265 0.273 0.185
Person 80000 80332 0.599 0.765 0.759 0.57 |
* add object365 * ADD CONVERSION SCRIPT * fix transcript * Reformat and simplify * spelling * Update get_objects365.py Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>
This PR contains conversion script to convert object365 to yolo format as well as the configuration file and hyper parameters for fine-tuning.
@glenn-jocher
🛠️ PR Summary
Made with ❤️ by Ultralytics Actions
🌟 Summary
Enhanced YOLOv5 support for Objects365 dataset with dedicated hyperparameters and label conversion scripts.
📊 Key Changes
hyp.finetune_objects365.yaml
, which includes tailored hyperparameters for fine-tuning on Objects365 dataset.objects365.yaml
, serving as a configuration file for the Objects365 dataset containing paths, class count, and names.get_argoverse_hd.sh
script to use cleaner naming for class ids in annotations.get_objects365.py
provided to convert Objects365 dataset labels from JSON to YOLO format.🎯 Purpose & Impact
hyp.finetune_objects365.yaml
allows users to fine-tune YOLOv5 models on the Objects365 dataset with optimized hyperparameters, which could improve model performance on this dataset.objects365.yaml
config file simplifies the process of using Objects365 with YOLOv5 by setting dataset-specific parameters.get_objects365.py
, users have a tool to easily prepare Objects365 data for YOLOv5 training, facilitating wider experimentation and adaptation of the model for diverse datasets.