Objects365 Dataset AutoDownload #2932

ferdinandl007 · 2021-04-26T08:45:06Z

This PR contains conversion script to convert object365 to yolo format as well as the configuration file and hyper parameters for fine-tuning.
@glenn-jocher

🛠️ PR Summary

_{Made with ❤️ by Ultralytics Actions}

🌟 Summary

Enhanced YOLOv5 support for Objects365 dataset with dedicated hyperparameters and label conversion scripts.

📊 Key Changes

Addition of hyp.finetune_objects365.yaml, which includes tailored hyperparameters for fine-tuning on Objects365 dataset.
Addition of objects365.yaml, serving as a configuration file for the Objects365 dataset containing paths, class count, and names.
Update to get_argoverse_hd.sh script to use cleaner naming for class ids in annotations.
New script get_objects365.py provided to convert Objects365 dataset labels from JSON to YOLO format.

🎯 Purpose & Impact

🎯 The introduction of hyp.finetune_objects365.yaml allows users to fine-tune YOLOv5 models on the Objects365 dataset with optimized hyperparameters, which could improve model performance on this dataset.
💡 The objects365.yaml config file simplifies the process of using Objects365 with YOLOv5 by setting dataset-specific parameters.
✅ Updated label processing script improves consistency and readability in annotations handling.
🛠️ With get_objects365.py, users have a tool to easily prepare Objects365 data for YOLOv5 training, facilitating wider experimentation and adaptation of the model for diverse datasets.
🚀 Overall, these changes make it easier for users to work with the large-scale Objects365 dataset, likely leading to improved object detection capabilities in varied scenarios.

github-actions

👋 Hello @ferdinandl007, thank you for submitting a 🚀 PR! To allow your work to be integrated as seamlessly as possible, we advise you to:

✅ Verify your PR is up-to-date with origin/master. If your PR is behind origin/master an automatic GitHub actions rebase may be attempted by including the /rebase command in a comment body, or by running the following code, replacing 'feature' with the name of your local branch:

git remote add upstream https://github.com/ultralytics/yolov5.git
git fetch upstream
git checkout feature  # <----- replace 'feature' with local branch name
git rebase upstream/master
git push -u origin -f

✅ Verify all Continuous Integration (CI) checks are passing.
✅ Reduce changes to the absolute minimum required for your bug fix or feature addition. "It is not daily increase but daily decrease, hack away the unessential. The closer to the source, the less wastage there is." -Bruce Lee

glenn-jocher · 2021-04-28T21:56:48Z

@ferdinandl007 I've reviewed the files and tried to clean them up a bit. I adjusted to a 120 character line and removed the val set separation as we have our own autosplit() function for splitting datasets into train/val/test splits here (it seems like Objects365 does not have pre-split train and val sections?)

yolov5/utils/datasets.py

Lines 1044 to 1064 in 33712d6

    
           def autosplit(path='../coco128', weights=(0.9, 0.1, 0.0), annotated_only=False): 
        
               """ Autosplit a dataset into train/val/test splits and save path/autosplit_*.txt files 
        
               Usage: from utils.datasets import *; autosplit('../coco128') 
        
               Arguments 
        
                   path:           Path to images directory 
        
                   weights:        Train, val, test weights (list) 
        
                   annotated_only: Only use images with an annotated txt file 
        
               """ 
        
               path = Path(path)  # images dir 
        
               files = sum([list(path.rglob(f"*.{img_ext}")) for img_ext in img_formats], [])  # image files only 
        
               n = len(files)  # number of files 
        
               indices = random.choices([0, 1, 2], weights=weights, k=n)  # assign each image to a split 
        
               txt = ['autosplit_train.txt', 'autosplit_val.txt', 'autosplit_test.txt']  # 3 txt files 
        
               [(path / x).unlink() for x in txt if (path / x).exists()]  # remove existing 
        
               print(f'Autosplitting images from {path}' + ', using *.txt labeled images only' * annotated_only) 
        
               for i, img in tqdm(zip(indices, files), total=n): 
        
                   if not annotated_only or Path(img2label_paths([str(img)])[0]).exists():  # check label 
        
                       with open(path / txt[i], 'a') as f: 
        
                           f.write(str(img) + '\n')  # add image to txt file

Also PyCharm picked up a few typos in the class names. Are you sure you copied these class names correctly?

glenn-jocher · 2021-04-28T21:57:45Z

/rebase

glenn-jocher · 2021-04-28T22:06:49Z

@ferdinandl007 I tried to correct the spelling where I was certain of the correct word in 3220e04. One class was 'campel', which I assumed was 'camel'. Can you check my spelling update to verify these class names are correct?

ferdinandl007 · 2021-04-29T12:06:20Z

@glenn-jocher Yes, that's correct; they are no discreetly separated test and Val sets for object 356 in terms of annotation files. All annotations are stored in zhiyuan_objv2_train.json. Awesome, that splitting function looks a lot better than what I had. Also, the simplification of the code looks great.
In terms of spelling of the classes, I extracted the class names and ids/ class indexs from zhiyuan_objv2_train.json if the annotators made any spelling mistakes, lol.

ferdinandl007 · 2021-04-29T12:16:14Z

@ferdinandl007 I tried to correct the spelling where I was certain of the correct word in 3220e04. One class was 'campel', which I assumed was 'camel'. Can you check my spelling update to verify these class names are correct?

The spelling updates look promising. Just double-check them on Grammarly, but I think you fix them all.
Do you have a look at the dataset website you can see all the spelling mistakes on there to https://www.objects365.org/explore.html if you look in the animal category you will find the 'campel' 😂

glenn-jocher · 2021-04-29T18:43:27Z

@ferdinandl007 merging PR. Hopefully this should help people get started faster with Objects365 in the future, even if they have to download the dataset themselves.

Thank you for your contributions!

glenn-jocher · 2021-04-30T23:12:42Z

@ferdinandl007 looking at the Objects365 download site https://open.baai.ac.cn/data-set-detail/MTI2NDc=/MTA=/true, I'm a little confused. There are 3 groups of images that say test, validate, train in Chinese. They each have 50 patches of images, but only train has a json annotations file.

Does the single train json have labels for the val images? Where do you get your 5570 images for val? Are these all from labels in sample_2020.json.tar.gz?

glenn-jocher · 2021-05-04T21:30:20Z

@ferdinandl007 I've updated objects365.yaml, it now pulls 1742289 images (1.7M) from the 50 train patches and creates 1742292 labels from zhiyuan_objv2_train.json.

All good more or less, but I'm confused about where you got your 5570 val images and labels. Could you explain that part? Thanks!

ferdinandl007 · 2021-05-05T20:01:40Z

@glenn-jocher Yes all annotations are included in the zhiyuan_objv2_train.json file. In terms of validation images I just took some out of the training said and move them into the validation set. and By the way have you already started training on it and what accuracy did you get?

glenn-jocher · 2021-05-06T11:18:37Z

@ferdinandl007 hmm interesting. No I haven't started training because I wasn't sure how to handle the val set. When scanning the labels it looks like some classes are very uncommon, i.e. only 200 instances in the dataset, so we would want the val set to guarantee at least 1 instance per class somehow.

I was thinking a 1%/99% val/train split might be a good idea with this dataet, for 17k val images.

We really want an official val set I think to make our results comparable too, otherwise results will vary by val set and we won't be able to create apples to apples comparisons.

* add object365 * ADD CONVERSION SCRIPT * fix transcript * Reformat and simplify * spelling * Update get_objects365.py Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>

tucker666 · 2021-07-15T03:26:50Z

@ferdinandl007 hmm interesting. No I haven't started training because I wasn't sure how to handle the val set. When scanning the labels it looks like some classes are very uncommon, i.e. only 200 instances in the dataset, so we would want the val set to guarantee at least 1 instance per class somehow.

I was thinking a 1%/99% val/train split might be a good idea with this dataet, for 17k val images.

We really want an official val set I think to make our results comparable too, otherwise results will vary by val set and we won't be able to create apples to apples comparisons.

now val dataset is avaliable on open.baai.ac.cn named zhiyuan_objv2_val.json ， by wget "https://dorc.ks3-cn-beijing.ksyun.com/data-set/2020Objects365%E6%95%B0%E6%8D%AE%E9%9B%86/val/zhiyuan_objv2_val.json"

* add object365 * ADD CONVERSION SCRIPT * fix transcript * Reformat and simplify * spelling * Update get_objects365.py Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com> (cherry picked from commit dbce1bc)

farleylai · 2021-10-14T00:17:31Z

As @tucker666 mentioned, the official zhiyuan_objv2_val.json is available.
Any ongoing PR?

glenn-jocher · 2021-10-14T01:49:11Z

@farleylai hi, thank you for your feature suggestion on how to improve YOLOv5 🚀!

The fastest and easiest way to incorporate your ideas into the official codebase is to submit a Pull Request (PR) implementing your idea, and if applicable providing before and after profiling/inference/training results to help us understand the improvement your feature provides. This allows us to directly see the changes in the code and to understand how they affect workflows and performance.

Please see our ✅ Contributing Guide to get started.

farleylai · 2021-10-14T20:08:22Z

@glenn-jocher, there are indeed validation images in v1 and v2. The paths are not as straightforward as the training set. Once registered with WeChat, the direct download paths will be revealed. I will manage to submit a PR after the download is complete.

glenn-jocher · 2021-10-14T20:49:29Z

@farleylai ah, that would be great! Yes, thank you for your help :)

farleylai · 2021-10-15T00:41:08Z

Just submitted as #5194

glenn-jocher · 2022-04-07T11:38:54Z

From #5194 (comment):

I trained a YOLOv5m model on Objects365 following this PR and the other related fixes. Everything works well. mAP@0.5:0.95 was only 18.5 after 30 epochs, but person mAP was similar to COCO, about 55 mAP@0.5:0.95. I'm sure this could also be improved with more epochs and additional tweaks, but at first glance all is good here.

DDP train command:

python train.py --data Objects365.yaml --batch 224 --weights --cfg yolov5m.yaml --epochs 30 --img 640 --hyp hyp.scratch-low.yaml --device 0,1,2,3,4,5,6,7

Results

# YOLOv5m v6.0 COCO 300 epochs
                 all       5000      36335      0.726      0.569      0.633      0.439
              person       5000      10777      0.792      0.735      0.807      0.554

# YOLOv5m v6.0 Objects365 30 epochs
                 all      80000    1239576      0.626      0.265      0.273      0.185
              Person      80000      80332      0.599      0.765      0.759       0.57

* add object365 * ADD CONVERSION SCRIPT * fix transcript * Reformat and simplify * spelling * Update get_objects365.py Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>

github-actions bot reviewed Apr 26, 2021

View reviewed changes

ferdinandl007 added 3 commits April 26, 2021 10:48

add object365

3e954fa

ADD CONVERSION SCRIPT

ceba623

fix transcript

2a2e412

ferdinandl007 force-pushed the master branch from 07063da to 2a2e412 Compare April 26, 2021 08:48

glenn-jocher changed the title ~~add object365~~ Objects365 Dataset Apr 27, 2021

Reformat and simplify

9c71d6f

spelling

3220e04

glenn-jocher added 2 commits April 29, 2021 20:34

Update get_objects365.py

d80eeae

merge master

5cf291d

glenn-jocher assigned ferdinandl007 Apr 29, 2021

glenn-jocher added the enhancement New feature or request label Apr 29, 2021

glenn-jocher merged commit dbce1bc into ultralytics:master Apr 29, 2021

glenn-jocher changed the title ~~Objects365 Dataset~~ Objects365 Dataset AutoDownload May 31, 2021

glenn-jocher mentioned this pull request May 31, 2021

train on object365 #3392

Closed

glenn-jocher mentioned this pull request Oct 12, 2021

YOLOv5 release v6.0 #5141

Merged

glenn-jocher mentioned this pull request Nov 7, 2021

YOLOv5 v6.0 compatibility update (draft) ultralytics/yolov3#1855

Closed

glenn-jocher mentioned this pull request Nov 14, 2021

YOLOv5 v6.0 compatibility update ultralytics/yolov3#1857

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Objects365 Dataset AutoDownload #2932

Objects365 Dataset AutoDownload #2932

ferdinandl007 commented Apr 26, 2021 •

edited by UltralyticsAssistant

Loading

github-actions bot left a comment

glenn-jocher commented Apr 28, 2021

glenn-jocher commented Apr 28, 2021

glenn-jocher commented Apr 28, 2021

ferdinandl007 commented Apr 29, 2021 •

edited

Loading

ferdinandl007 commented Apr 29, 2021

glenn-jocher commented Apr 29, 2021

glenn-jocher commented Apr 30, 2021 •

edited

Loading

glenn-jocher commented May 4, 2021

ferdinandl007 commented May 5, 2021 •

edited

Loading

glenn-jocher commented May 6, 2021

tucker666 commented Jul 15, 2021

farleylai commented Oct 14, 2021

glenn-jocher commented Oct 14, 2021 •

edited

Loading

farleylai commented Oct 14, 2021 •

edited

Loading

glenn-jocher commented Oct 14, 2021

farleylai commented Oct 15, 2021

glenn-jocher commented Apr 7, 2022 •

edited

Loading

Objects365 Dataset AutoDownload #2932

Objects365 Dataset AutoDownload #2932

Conversation

ferdinandl007 commented Apr 26, 2021 • edited by UltralyticsAssistant Loading

🛠️ PR Summary

🌟 Summary

📊 Key Changes

🎯 Purpose & Impact

github-actions bot left a comment

Choose a reason for hiding this comment

glenn-jocher commented Apr 28, 2021

glenn-jocher commented Apr 28, 2021

glenn-jocher commented Apr 28, 2021

ferdinandl007 commented Apr 29, 2021 • edited Loading

ferdinandl007 commented Apr 29, 2021

glenn-jocher commented Apr 29, 2021

glenn-jocher commented Apr 30, 2021 • edited Loading

glenn-jocher commented May 4, 2021

ferdinandl007 commented May 5, 2021 • edited Loading

glenn-jocher commented May 6, 2021

tucker666 commented Jul 15, 2021

farleylai commented Oct 14, 2021

glenn-jocher commented Oct 14, 2021 • edited Loading

farleylai commented Oct 14, 2021 • edited Loading

glenn-jocher commented Oct 14, 2021

farleylai commented Oct 15, 2021

glenn-jocher commented Apr 7, 2022 • edited Loading

ferdinandl007 commented Apr 26, 2021 •

edited by UltralyticsAssistant

Loading

ferdinandl007 commented Apr 29, 2021 •

edited

Loading

glenn-jocher commented Apr 30, 2021 •

edited

Loading

ferdinandl007 commented May 5, 2021 •

edited

Loading

glenn-jocher commented Oct 14, 2021 •

edited

Loading

farleylai commented Oct 14, 2021 •

edited

Loading

glenn-jocher commented Apr 7, 2022 •

edited

Loading