Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

port datasets from the old to the new API #5336

Open
5 of 31 tasks
pmeier opened this issue Feb 3, 2022 · 12 comments
Open
5 of 31 tasks

port datasets from the old to the new API #5336

pmeier opened this issue Feb 3, 2022 · 12 comments

Comments

@pmeier
Copy link
Collaborator

pmeier commented Feb 3, 2022

The new dataset API is now stable enough to start porting more datasets from the old API. For the 0.13.0 release planned for 2022H2 we want to achieve at least feature parity for the new API. If you want to help out, please comment on the respective issue so we can assign it to you.

The process of adding a dataset to the new API is described here. In addition, we already ported some datasets that you could use as reference. In any case, if you are blocked by something feel free to send a partial PR and ping me there so I can help.

The following datasets need to be ported:

Image classification

Image classification datasets are good starting point if you are not familiar with the dataset or the new API since they these datsets tend to be the easiest.

Image detection or segmentation

Image detection or segmentation datasets tend to be a little harder since one needs to merge more infomation into one sample compared to classification. My suggestion is to only pick one of these if you are either familiar with the dataset or the new API so you don't have two manage two things at once.

Image pairs

We are still designing how exactly image pair datasets should be implemented. I list them here for completeness, but I suggest not picking up any of them until the design is finished.

Video classification

We are still designing how exactly video datasets should be implemented. I list them here for completeness, but I suggest not picking up any of them until the design is finished.

Optical flow

We are still designing how exactly optical flow datasets should be implemented. I list them here for completeness, but I suggest not picking up any of them until the design is finished.

cc @pmeier @bjuncek

Footnotes

  1. These datasets do not provide public download links for the data so they might be harder to work on. 2 3

  2. Maybe we should have lfw/people, kitti/object, and kitti/flow datasets to cleanly separate the different variants. This also applies to coco as discussed in https://github.com/pytorch/vision/pull/5326#discussion_r796813705 2 3

  3. These datasets are implemented as classification datasets in the old API, but provide extra annotations for detection or segmentation. 2 3

@Dbhasin1

This comment was marked as resolved.

@pmeier
Copy link
Collaborator Author

pmeier commented Feb 3, 2022

Hey @Dbhasin1, thanks a lot for the interest. I can only assign you to an issue if you comment on it. Could you please do so on the three you picked?

@abhi-glitchhg

This comment was marked as resolved.

@pmeier
Copy link
Collaborator Author

pmeier commented Feb 15, 2022

Hey @Dbhasin1 @abhi-glitchhg @vballoli @Amapocho @vfdev-5. We recently merged #5407, which included some changes to the prototype datasets. I believe the only touching point for you is the removal of the decoder. This means, _make_datapipe no longer gets passed a decoder. Instead you can from torchvision.prototype.features import EncodedImage and just use image = EncodedImage.from_file(buffer) rather than the old idiom image = decoder(buffer) if decoder else buffer.

@yassineAlouini
Copy link
Contributor

@pmeier Since food101 is almost done (waiting for the CLA from my employer), I can work on another dataset. Is Kitti available?

@pmeier
Copy link
Collaborator Author

pmeier commented Mar 17, 2022

@yassineAlouini Yes, Kitti is still open. Please comment on #5355 so I can assign you. That being said, the CLA is a hard requirement and we can't accept contributions without it. So if this just a matter of time you can go ahead. But if there is chance the CLA is not approved by your employerr, you might want to wait until the process is finished. Otherwise you might end up with a lot of wasted work.

@yassineAlouini
Copy link
Contributor

@pmeier Should be a formality by now, I will just check the old API while waiting for the document. 👌

@pmeier
Copy link
Collaborator Author

pmeier commented Mar 28, 2022

Hey everyone. We decided to remove the help "wanted label" for now. Reviewing the PRs takes more time out of my schedule than I anticipated. This is not comment on the quality of the PRs, but rather a late acknowledgement that due to their diverse nature datasets are hard to review. We very much appreciate ever contribution towards closing this issue.

@yassineAlouini @puhuk @abhi-glitchhg @Dbhasin1 @zhiqwang @Amapocho you all have issues assigned to you for which there is no PR yet. If you haven't started yet, I suggest not starting until we give another signal here. If you already have an implementation, you might also send a PR and I will try to review them in a timely manner. In case you don't want to work on the dataset anymore, please comment on the issue so I can un-assign you.

@lezwon @yassineAlouini @vballoli Our decision has no effect on your already open PRs. I'll review them normally.

@yassineAlouini
Copy link
Contributor

@pmeier I have started working on the Kitti one but I can pause the work for now and work on something else in the meantime (any suggestion is welcome). And thanks again for your time and guidance, it takes a lot of effort but is very appreicated. 🥰

@puhuk
Copy link
Contributor

puhuk commented Mar 28, 2022

@pmeier I'm supposed to send PR within this week and I also can pause the work :)
If you want me to stop sending PR, I can stop, or let me send it by this weekend.

@pmeier
Copy link
Collaborator Author

pmeier commented Mar 28, 2022

@puhuk if you already have some code, you can go ahead and send a PR to not let your work go to waste. Otherwise, pausing development for some time would also be appreciated.

@puhuk
Copy link
Contributor

puhuk commented Apr 2, 2022

@pmeier Sorry for late reply. Let me pause the development and stop this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants