Skip to content

Detection Datasets in Torchvision #3047

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
oke-aditya opened this issue Nov 25, 2020 · 3 comments
Closed

Detection Datasets in Torchvision #3047

oke-aditya opened this issue Nov 25, 2020 · 3 comments

Comments

@oke-aditya
Copy link
Contributor

oke-aditya commented Nov 25, 2020

🚀 Feature

Can we have Pen Fudan Dataset in torchvision.datasets ?

Motivation

We use this dataset so often and commonly in tutorials !
It is much easier to prototype if we have

torchvision.datasets.PenFudan(root: str, transform: Optional[Callable] = None, target_transform: Optional[Callable] = None, download: bool = False)

then we could easily load this dataset in VOC Format

Pitch

Very commonly used and first one to think when it comes to detection and segmentation tasks.
Pen Fudan is a simple dataset to prototype with instead of COCO.
It would be really simple to load the data in VOC Format which is directly compatible with torchvision models.
This would keep quickstart and prototyping very fast, like CIFAR10 does !

I'm not sure of some aspects.

  1. Should the targets start from 0 or 1. In torchvision we assume 0 to be background, but that might not be always true.
  2. Should we load the boxes in VOC Format, or have a param to control that ? We can use box_convert and return in format people need.

Alternatives

Currently in tutorial for Object Detection we do show how to load it and use it

It should be nice addtion, as we don't have a handy detection dataset to prototype (apart from COCO)

Additional context

I coulnd't find the paper and citation count, so I'm not sure if that is needed to add to torchvision.

cc @pmeier

@fmassa
Copy link
Member

fmassa commented Nov 27, 2020

Hi,

Thanks for the suggestion!

I think one of the key requests from users when trying to fine-tune a model is how they should bring their own datasets for finetuning. As such, one of the main ingredients of the object detection finetuning tutorial is how to write a Dataset class that is compatible with the rest of the training abstractions that we provide.
If the tutorial only contained torchvision.datasets.PenFudan(...) to get the data, the users would need to do more work to understand what they need to change to bring their own data.

So if we were to provide PenFudan in torchvision, we would need to find another dataset for the tutorials.

As of now, I think PenFudan is a good dataset for using in tutorials due to its simplicity (it doesn't provide boxes, so we have to compute them ourselves), and given that it only contain people as a class, it's not a very good benchmark compared to Pascal or COCO for finetuning (as both datasets are much larger than PenFudan and also contain the people class).

Datasets and standards

But this brings a separate (and very important) question as well: the datasets in torchvision do not have strong standardization wrt output types etc. This was already discussed in #1080 , but as I mentioned there, the more structure we add, the less flexible we are, and the more tied to a particular training loop we also are.
If all datasets were formatted the same and had the same expected return types, then we could provide a PenFudan dataset in torchvision, as the specs for how a dataset should be formatted would be always the same and documented in a single place. But from the discussion in #1080, maybe this standardization is not something we should impose to the datasets.

cc @datumbox @pmeier for thoughts.

@oke-aditya
Copy link
Contributor Author

oke-aditya commented Nov 30, 2020

Hi @fmassa I have a few thoughts, and let's not be specific about Pen Fudan but instead try for detection datasets in General.

I agree that the tutorial should contain how to create a dataset which is correct input format for torchvision models. I think this is integral part of tutorial and let's not change that.

Now, we again come back to question of datasets and standard. Let me make this Issue a bit generic. Currently torchvision supports lot of classification datasets (MNIST, LSUN, CIFAR, EMNIST, etc). For object detection, it currently supports COCO detection dataset.

So the question is.

How do we add new detection datasets to torchvision ?

PenFudan might not be the best dataset to add, But there are a few other datasets apart from COCO.

  1. Objectron. A simple PyTorch notebook to load data is here.
  2. Open Images dataset.

Possibly we could edit this list and take citations and contributing to datasets context. Datasets often come in different formats.

Some thoughts about standardization.

  1. We cannot restrict dataset to load only in COCO format or VOC Format, different models need different format, and torchvision provides datasets for common use case, not just to load to torchvision models.
  2. We might not be able to train test split these datasets as it depends on provider. If we do provide train and test it might not be consistent or possibly users would like to have different.

I also suggest to have a FakeDetectionDataset that can generate datasets in COCO, VOC and YOLO format. This is just analogous to FakeData already in torchvision.

@oke-aditya oke-aditya changed the title Pen Fudan Dataset in Torchvision Detection Datasets in Torchvision Nov 30, 2020
@oke-aditya
Copy link
Contributor Author

Closing this in favour of #3562.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants