Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Objects365 Dataset AutoDownload #2932

Merged
merged 7 commits into from
Apr 29, 2021
Merged

Conversation

ferdinandl007
Copy link
Contributor

@ferdinandl007 ferdinandl007 commented Apr 26, 2021

This PR contains conversion script to convert object365 to yolo format as well as the configuration file and hyper parameters for fine-tuning.
@glenn-jocher

🛠️ PR Summary

Made with ❤️ by Ultralytics Actions

🌟 Summary

Enhanced YOLOv5 support for Objects365 dataset with dedicated hyperparameters and label conversion scripts.

📊 Key Changes

  • Addition of hyp.finetune_objects365.yaml, which includes tailored hyperparameters for fine-tuning on Objects365 dataset.
  • Addition of objects365.yaml, serving as a configuration file for the Objects365 dataset containing paths, class count, and names.
  • Update to get_argoverse_hd.sh script to use cleaner naming for class ids in annotations.
  • New script get_objects365.py provided to convert Objects365 dataset labels from JSON to YOLO format.

🎯 Purpose & Impact

  • 🎯 The introduction of hyp.finetune_objects365.yaml allows users to fine-tune YOLOv5 models on the Objects365 dataset with optimized hyperparameters, which could improve model performance on this dataset.
  • 💡 The objects365.yaml config file simplifies the process of using Objects365 with YOLOv5 by setting dataset-specific parameters.
  • ✅ Updated label processing script improves consistency and readability in annotations handling.
  • 🛠️ With get_objects365.py, users have a tool to easily prepare Objects365 data for YOLOv5 training, facilitating wider experimentation and adaptation of the model for diverse datasets.
  • 🚀 Overall, these changes make it easier for users to work with the large-scale Objects365 dataset, likely leading to improved object detection capabilities in varied scenarios.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👋 Hello @ferdinandl007, thank you for submitting a 🚀 PR! To allow your work to be integrated as seamlessly as possible, we advise you to:

  • ✅ Verify your PR is up-to-date with origin/master. If your PR is behind origin/master an automatic GitHub actions rebase may be attempted by including the /rebase command in a comment body, or by running the following code, replacing 'feature' with the name of your local branch:
git remote add upstream https://github.com/ultralytics/yolov5.git
git fetch upstream
git checkout feature  # <----- replace 'feature' with local branch name
git rebase upstream/master
git push -u origin -f
  • ✅ Verify all Continuous Integration (CI) checks are passing.
  • ✅ Reduce changes to the absolute minimum required for your bug fix or feature addition. "It is not daily increase but daily decrease, hack away the unessential. The closer to the source, the less wastage there is." -Bruce Lee

@glenn-jocher glenn-jocher changed the title add object365 Objects365 Dataset Apr 27, 2021
@glenn-jocher
Copy link
Member

@ferdinandl007 I've reviewed the files and tried to clean them up a bit. I adjusted to a 120 character line and removed the val set separation as we have our own autosplit() function for splitting datasets into train/val/test splits here (it seems like Objects365 does not have pre-split train and val sections?)

yolov5/utils/datasets.py

Lines 1044 to 1064 in 33712d6

def autosplit(path='../coco128', weights=(0.9, 0.1, 0.0), annotated_only=False):
""" Autosplit a dataset into train/val/test splits and save path/autosplit_*.txt files
Usage: from utils.datasets import *; autosplit('../coco128')
Arguments
path: Path to images directory
weights: Train, val, test weights (list)
annotated_only: Only use images with an annotated txt file
"""
path = Path(path) # images dir
files = sum([list(path.rglob(f"*.{img_ext}")) for img_ext in img_formats], []) # image files only
n = len(files) # number of files
indices = random.choices([0, 1, 2], weights=weights, k=n) # assign each image to a split
txt = ['autosplit_train.txt', 'autosplit_val.txt', 'autosplit_test.txt'] # 3 txt files
[(path / x).unlink() for x in txt if (path / x).exists()] # remove existing
print(f'Autosplitting images from {path}' + ', using *.txt labeled images only' * annotated_only)
for i, img in tqdm(zip(indices, files), total=n):
if not annotated_only or Path(img2label_paths([str(img)])[0]).exists(): # check label
with open(path / txt[i], 'a') as f:
f.write(str(img) + '\n') # add image to txt file

Also PyCharm picked up a few typos in the class names. Are you sure you copied these class names correctly?
Screenshot 2021-04-28 at 23 54 19

@glenn-jocher
Copy link
Member

/rebase

@glenn-jocher
Copy link
Member

@ferdinandl007 I tried to correct the spelling where I was certain of the correct word in 3220e04. One class was 'campel', which I assumed was 'camel'. Can you check my spelling update to verify these class names are correct?

@ferdinandl007
Copy link
Contributor Author

ferdinandl007 commented Apr 29, 2021

@glenn-jocher Yes, that's correct; they are no discreetly separated test and Val sets for object 356 in terms of annotation files. All annotations are stored in zhiyuan_objv2_train.json. Awesome, that splitting function looks a lot better than what I had. Also, the simplification of the code looks great.
In terms of spelling of the classes, I extracted the class names and ids/ class indexs from zhiyuan_objv2_train.json if the annotators made any spelling mistakes, lol.

@ferdinandl007
Copy link
Contributor Author

@ferdinandl007 I tried to correct the spelling where I was certain of the correct word in 3220e04. One class was 'campel', which I assumed was 'camel'. Can you check my spelling update to verify these class names are correct?

The spelling updates look promising. Just double-check them on Grammarly, but I think you fix them all.
Do you have a look at the dataset website you can see all the spelling mistakes on there to https://www.objects365.org/explore.html if you look in the animal category you will find the 'campel' 😂

@glenn-jocher glenn-jocher added the enhancement New feature or request label Apr 29, 2021
@glenn-jocher
Copy link
Member

@ferdinandl007 merging PR. Hopefully this should help people get started faster with Objects365 in the future, even if they have to download the dataset themselves.

Thank you for your contributions!

@glenn-jocher glenn-jocher merged commit dbce1bc into ultralytics:master Apr 29, 2021
@glenn-jocher
Copy link
Member

glenn-jocher commented Apr 30, 2021

@ferdinandl007 looking at the Objects365 download site https://open.baai.ac.cn/data-set-detail/MTI2NDc=/MTA=/true, I'm a little confused. There are 3 groups of images that say test, validate, train in Chinese. They each have 50 patches of images, but only train has a json annotations file.

Does the single train json have labels for the val images? Where do you get your 5570 images for val? Are these all from labels in sample_2020.json.tar.gz?

@glenn-jocher
Copy link
Member

@ferdinandl007 I've updated objects365.yaml, it now pulls 1742289 images (1.7M) from the 50 train patches and creates 1742292 labels from zhiyuan_objv2_train.json.

All good more or less, but I'm confused about where you got your 5570 val images and labels. Could you explain that part? Thanks!

@ferdinandl007
Copy link
Contributor Author

ferdinandl007 commented May 5, 2021

@glenn-jocher Yes all annotations are included in the zhiyuan_objv2_train.json file. In terms of validation images I just took some out of the training said and move them into the validation set. and By the way have you already started training on it and what accuracy did you get?

@glenn-jocher
Copy link
Member

@ferdinandl007 hmm interesting. No I haven't started training because I wasn't sure how to handle the val set. When scanning the labels it looks like some classes are very uncommon, i.e. only 200 instances in the dataset, so we would want the val set to guarantee at least 1 instance per class somehow.

I was thinking a 1%/99% val/train split might be a good idea with this dataet, for 17k val images.

We really want an official val set I think to make our results comparable too, otherwise results will vary by val set and we won't be able to create apples to apples comparisons.

KMint1819 pushed a commit to KMint1819/yolov5 that referenced this pull request May 12, 2021
* add object365

* ADD CONVERSION SCRIPT

* fix transcript

* Reformat and simplify

* spelling

* Update get_objects365.py

Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>
danny-schwartz7 pushed a commit to danny-schwartz7/yolov5 that referenced this pull request May 22, 2021
* add object365

* ADD CONVERSION SCRIPT

* fix transcript

* Reformat and simplify

* spelling

* Update get_objects365.py

Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>
@glenn-jocher glenn-jocher changed the title Objects365 Dataset Objects365 Dataset AutoDownload May 31, 2021
@glenn-jocher glenn-jocher mentioned this pull request May 31, 2021
@tucker666
Copy link

@ferdinandl007 hmm interesting. No I haven't started training because I wasn't sure how to handle the val set. When scanning the labels it looks like some classes are very uncommon, i.e. only 200 instances in the dataset, so we would want the val set to guarantee at least 1 instance per class somehow.

I was thinking a 1%/99% val/train split might be a good idea with this dataet, for 17k val images.

We really want an official val set I think to make our results comparable too, otherwise results will vary by val set and we won't be able to create apples to apples comparisons.

now val dataset is avaliable on open.baai.ac.cn named zhiyuan_objv2_val.json , by wget "https://dorc.ks3-cn-beijing.ksyun.com/data-set/2020Objects365%E6%95%B0%E6%8D%AE%E9%9B%86/val/zhiyuan_objv2_val.json"

Lechtr pushed a commit to Lechtr/yolov5 that referenced this pull request Jul 20, 2021
* add object365

* ADD CONVERSION SCRIPT

* fix transcript

* Reformat and simplify

* spelling

* Update get_objects365.py

Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>
(cherry picked from commit dbce1bc)
@farleylai
Copy link
Contributor

As @tucker666 mentioned, the official zhiyuan_objv2_val.json is available.
Any ongoing PR?

@glenn-jocher
Copy link
Member

glenn-jocher commented Oct 14, 2021

@farleylai hi, thank you for your feature suggestion on how to improve YOLOv5 🚀!

The fastest and easiest way to incorporate your ideas into the official codebase is to submit a Pull Request (PR) implementing your idea, and if applicable providing before and after profiling/inference/training results to help us understand the improvement your feature provides. This allows us to directly see the changes in the code and to understand how they affect workflows and performance.

Please see our ✅ Contributing Guide to get started.

@farleylai
Copy link
Contributor

farleylai commented Oct 14, 2021

@glenn-jocher, there are indeed validation images in v1 and v2. The paths are not as straightforward as the training set. Once registered with WeChat, the direct download paths will be revealed. I will manage to submit a PR after the download is complete.

@glenn-jocher
Copy link
Member

@farleylai ah, that would be great! Yes, thank you for your help :)

@farleylai
Copy link
Contributor

Just submitted as #5194

@glenn-jocher
Copy link
Member

glenn-jocher commented Apr 7, 2022

From #5194 (comment):

I trained a YOLOv5m model on Objects365 following this PR and the other related fixes. Everything works well. mAP@0.5:0.95 was only 18.5 after 30 epochs, but person mAP was similar to COCO, about 55 mAP@0.5:0.95. I'm sure this could also be improved with more epochs and additional tweaks, but at first glance all is good here.

DDP train command:

python train.py --data Objects365.yaml --batch 224 --weights --cfg yolov5m.yaml --epochs 30 --img 640 --hyp hyp.scratch-low.yaml --device 0,1,2,3,4,5,6,7

Results

# YOLOv5m v6.0 COCO 300 epochs
                 all       5000      36335      0.726      0.569      0.633      0.439
              person       5000      10777      0.792      0.735      0.807      0.554

# YOLOv5m v6.0 Objects365 30 epochs
                 all      80000    1239576      0.626      0.265      0.273      0.185
              Person      80000      80332      0.599      0.765      0.759       0.57

BjarneKuehl pushed a commit to fhkiel-mlaip/yolov5 that referenced this pull request Aug 26, 2022
* add object365

* ADD CONVERSION SCRIPT

* fix transcript

* Reformat and simplify

* spelling

* Update get_objects365.py

Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants