Migrate PCAM prototype dataset #5745

NicolasHug · 2022-04-05T14:56:53Z

No description provided.

pmeier · 2022-04-05T17:19:38Z

torchvision/prototype/datasets/_builtin/pcam.py

+def _info() -> Dict[str, Any]:
+    return dict(
+        categories=[0, 1],
+        dependencies=["h5py"],


Do we need this in the static info? Or better, can someone actually do something with this information at runtime? In #5473 I added a dependencies parameter to Dataset2.__init__ similar to what the current DatasetInfo is doing:

vision/torchvision/prototype/datasets/utils/_dataset.py

Line 29 in 63576c9

dependencies: Collection[str] = (),

Maybe we can wait for the other PR to be merged and adopt this change here?

pmeier · 2022-04-05T17:19:55Z

torchvision/prototype/datasets/_builtin/pcam.py

+@register_info(NAME)
+def _info() -> Dict[str, Any]:
+    return dict(
+        categories=[0, 1],


Categories should be strings

Suggested change

categories=[0, 1],

categories=["0", "1"],

Why? That sounds overkill for non-labeled classes which are usually treated as ints in other popular frameworks

The the naming scheme a "label" is the integer representation while "category" is the human-readable string representation. That is the same what categories=2 did before:

vision/torchvision/prototype/datasets/utils/_dataset.py

Lines 44 to 45 in 08cc9a7

if isinstance(categories, int):

categories = [str(label) for label in range(categories)]

So we either should stick to this scheme and make the categories strings or drop the categories. I agree, the latter might be the better option for datasets that don't have human readable categories.

Let's just mark this as a TODO for later. For now we can assume that all datasets have a categories info. I guess it's still useful so as to know the number of classes

torchvision/prototype/datasets/_builtin/pcam.py

Co-authored-by: Philip Meier <github.pmeier@posteo.de>

pmeier

Thanks @NicolasHug, LGTM! We need to wait or #5473 before this can pass CI due to the missing dependencies parameter in the base class.

…ision into pcam_mig

* refactor prototype datasets to inherit from IterDataPipe (#5448) * refactor prototype datasets to inherit from IterDataPipe * depend on new architecture * fix missing file detection * remove unrelated file * reinstante decorator for mock registering * options -> config * remove passing of info to mock data functions * refactor categories file generation * fix imagenet * fix prototype datasets data loading tests (#5711) * reenable serialization test * cleanup * fix dill test * trigger CI * patch DILL_AVAILABLE for pickle serialization * revert CI changes * remove dill test and traversable test * add data loader test * parametrize over only_datapipe * draw one sample rather than exhaust data loader * cleanup * trigger CI * migrate VOC prototype dataset (#5743) * migrate VOC prototype dataset * cleanup * revert unrelated mock data changes * remove categories annotations * move properties to constructor * readd homepage * migrate CIFAR prototype datasets (#5751) * migrate country211 prototype dataset (#5753) * migrate CLEVR prototype datsaet (#5752) * migrate coco prototype (#5473) * migrate coco prototype * revert unrelated change * add kwargs to super constructor call * remove unneeded changes * fix docstring position * make kwargs explicit * add dependencies to docstring * fix missing dependency message * Migrate PCAM prototype dataset (#5745) * Port PCAM * skip_integrity_check * Update torchvision/prototype/datasets/_builtin/pcam.py Co-authored-by: Philip Meier <github.pmeier@posteo.de> * Address comments Co-authored-by: Philip Meier <github.pmeier@posteo.de> * Migrate DTD prototype dataset (#5757) * Migrate DTD prototype dataset * Docstring * Apply suggestions from code review Co-authored-by: Philip Meier <github.pmeier@posteo.de> Co-authored-by: Philip Meier <github.pmeier@posteo.de> * Migrate GTSRB prototype dataset (#5746) * Migrate GTSRB prototype dataset * ufmt * Address comments * Apparently mypy doesn't know that __len__ returns ints. How cute. * why is the CI not triggered?? * Update torchvision/prototype/datasets/_builtin/gtsrb.py Co-authored-by: Philip Meier <github.pmeier@posteo.de> Co-authored-by: Philip Meier <github.pmeier@posteo.de> * migrate CelebA prototype dataset (#5750) * migrate CelebA prototype dataset * inline split_id * Migrate Food101 prototype dataset (#5758) * Migrate Food101 dataset * Added length * Update torchvision/prototype/datasets/_builtin/food101.py Co-authored-by: Philip Meier <github.pmeier@posteo.de> Co-authored-by: Philip Meier <github.pmeier@posteo.de> * Migrate Fer2013 prototype dataset (#5759) * Migrate Fer2013 prototype dataset * Update torchvision/prototype/datasets/_builtin/fer2013.py Co-authored-by: Philip Meier <github.pmeier@posteo.de> Co-authored-by: Philip Meier <github.pmeier@posteo.de> * Migrate EuroSAT prototype dataset (#5760) * Migrate Semeion prototype dataset (#5761) * migrate caltech prototype datasets (#5749) * migrate caltech prototype datasets * resolve third party dependencies * Migrate Oxford Pets prototype dataset (#5764) * Migrate Oxford Pets prototype dataset * Update torchvision/prototype/datasets/_builtin/oxford_iiit_pet.py Co-authored-by: Philip Meier <github.pmeier@posteo.de> Co-authored-by: Philip Meier <github.pmeier@posteo.de> * migrate mnist prototype datasets (#5480) * migrate MNIST prototype datasets * Update torchvision/prototype/datasets/_builtin/mnist.py Co-authored-by: Nicolas Hug <contact@nicolas-hug.com> Co-authored-by: Nicolas Hug <contact@nicolas-hug.com> * Migrate Stanford Cars prototype dataset (#5767) * Migrate Stanford Cars prototype dataset * Address comments * fix category file generation (#5770) * fix category file generation * revert unrelated change * revert unrelated change * migrate cub200 prototype dataset (#5765) * migrate cub200 prototype dataset * address comments * fix category-file-generation * Migrate USPS prototype dataset (#5771) * migrate SBD prototype dataset (#5772) * migrate SBD prototype dataset * reuse categories * Migrate SVHN prototype dataset (#5769) * add test to enforce __len__ is working on prototype datasets (#5742) * reactivate special dataset tests * add missing annotation * Cleanup prototype dataset implementation (#5774) * Remove Dataset2 class * Move read_categories_file out of DatasetInfo * Remove FrozenBunch and FrozenMapping * Remove test_prototype_datasets_api.py and move missing dep test somewhere else * ufmt * Let read_categories_file accept names instead of paths * Mypy * flake8 * fix category file reading Co-authored-by: Philip Meier <github.pmeier@posteo.de> * update prototype dataset README (#5777) * update prototype dataset README * fix header level * Apply suggestions from code review Co-authored-by: Nicolas Hug <contact@nicolas-hug.com> Co-authored-by: Nicolas Hug <contact@nicolas-hug.com> Co-authored-by: Nicolas Hug <contact@nicolas-hug.com>

Summary: * refactor prototype datasets to inherit from IterDataPipe (#5448) * refactor prototype datasets to inherit from IterDataPipe * depend on new architecture * fix missing file detection * remove unrelated file * reinstante decorator for mock registering * options -> config * remove passing of info to mock data functions * refactor categories file generation * fix imagenet * fix prototype datasets data loading tests (#5711) * reenable serialization test * cleanup * fix dill test * trigger CI * patch DILL_AVAILABLE for pickle serialization * revert CI changes * remove dill test and traversable test * add data loader test * parametrize over only_datapipe * draw one sample rather than exhaust data loader * cleanup * trigger CI * migrate VOC prototype dataset (#5743) * migrate VOC prototype dataset * cleanup * revert unrelated mock data changes * remove categories annotations * move properties to constructor * readd homepage * migrate CIFAR prototype datasets (#5751) * migrate country211 prototype dataset (#5753) * migrate CLEVR prototype datsaet (#5752) * migrate coco prototype (#5473) * migrate coco prototype * revert unrelated change * add kwargs to super constructor call * remove unneeded changes * fix docstring position * make kwargs explicit * add dependencies to docstring * fix missing dependency message * Migrate PCAM prototype dataset (#5745) * Port PCAM * skip_integrity_check * Update torchvision/prototype/datasets/_builtin/pcam.py * Address comments * Migrate DTD prototype dataset (#5757) * Migrate DTD prototype dataset * Docstring * Apply suggestions from code review * Migrate GTSRB prototype dataset (#5746) * Migrate GTSRB prototype dataset * ufmt * Address comments * Apparently mypy doesn't know that __len__ returns ints. How cute. * why is the CI not triggered?? * Update torchvision/prototype/datasets/_builtin/gtsrb.py * migrate CelebA prototype dataset (#5750) * migrate CelebA prototype dataset * inline split_id * Migrate Food101 prototype dataset (#5758) * Migrate Food101 dataset * Added length * Update torchvision/prototype/datasets/_builtin/food101.py * Migrate Fer2013 prototype dataset (#5759) * Migrate Fer2013 prototype dataset * Update torchvision/prototype/datasets/_builtin/fer2013.py * Migrate EuroSAT prototype dataset (#5760) * Migrate Semeion prototype dataset (#5761) * migrate caltech prototype datasets (#5749) * migrate caltech prototype datasets * resolve third party dependencies * Migrate Oxford Pets prototype dataset (#5764) * Migrate Oxford Pets prototype dataset * Update torchvision/prototype/datasets/_builtin/oxford_iiit_pet.py * migrate mnist prototype datasets (#5480) * migrate MNIST prototype datasets * Update torchvision/prototype/datasets/_builtin/mnist.py * Migrate Stanford Cars prototype dataset (#5767) * Migrate Stanford Cars prototype dataset * Address comments * fix category file generation (#5770) * fix category file generation * revert unrelated change * revert unrelated change * migrate cub200 prototype dataset (#5765) * migrate cub200 prototype dataset * address comments * fix category-file-generation * Migrate USPS prototype dataset (#5771) * migrate SBD prototype dataset (#5772) * migrate SBD prototype dataset * reuse categories * Migrate SVHN prototype dataset (#5769) * add test to enforce __len__ is working on prototype datasets (#5742) * reactivate special dataset tests * add missing annotation * Cleanup prototype dataset implementation (#5774) * Remove Dataset2 class * Move read_categories_file out of DatasetInfo * Remove FrozenBunch and FrozenMapping * Remove test_prototype_datasets_api.py and move missing dep test somewhere else * ufmt * Let read_categories_file accept names instead of paths * Mypy * flake8 * fix category file reading * update prototype dataset README (#5777) * update prototype dataset README * fix header level * Apply suggestions from code review (Note: this ignores all push blocking failures!) Reviewed By: jdsgomes, NicolasHug Differential Revision: D36095693 fbshipit-source-id: d57f2b4a89ef1c45f5e2783ea57bce08df5c561d Co-authored-by: Philip Meier <github.pmeier@posteo.de> Co-authored-by: Philip Meier <github.pmeier@posteo.de> Co-authored-by: Philip Meier <github.pmeier@posteo.de> Co-authored-by: Philip Meier <github.pmeier@posteo.de> Co-authored-by: Philip Meier <github.pmeier@posteo.de> Co-authored-by: Philip Meier <github.pmeier@posteo.de> Co-authored-by: Philip Meier <github.pmeier@posteo.de> Co-authored-by: Philip Meier <github.pmeier@posteo.de> Co-authored-by: Philip Meier <github.pmeier@posteo.de> Co-authored-by: Philip Meier <github.pmeier@posteo.de> Co-authored-by: Philip Meier <github.pmeier@posteo.de> Co-authored-by: Philip Meier <github.pmeier@posteo.de> Co-authored-by: Nicolas Hug <contact@nicolas-hug.com> Co-authored-by: Nicolas Hug <contact@nicolas-hug.com> Co-authored-by: Philip Meier <github.pmeier@posteo.de> Co-authored-by: Nicolas Hug <contact@nicolas-hug.com> Co-authored-by: Nicolas Hug <contact@nicolas-hug.com> Co-authored-by: Nicolas Hug <contact@nicolas-hug.com>

Port PCAM

cac4131

NicolasHug requested a review from pmeier April 5, 2022 14:56

facebook-github-bot added the cla signed label Apr 5, 2022

skip_integrity_check

72a9e09

pmeier reviewed Apr 5, 2022

View reviewed changes

pmeier mentioned this pull request Apr 6, 2022

Migrate GTSRB prototype dataset #5746

Merged

pmeier reviewed Apr 6, 2022

View reviewed changes

torchvision/prototype/datasets/_builtin/pcam.py Outdated Show resolved Hide resolved

NicolasHug and others added 2 commits April 6, 2022 11:20

Update torchvision/prototype/datasets/_builtin/pcam.py

3b4af86

Co-authored-by: Philip Meier <github.pmeier@posteo.de>

Address comments

7ffb694

pmeier approved these changes Apr 6, 2022

View reviewed changes

Merge branch 'prototype-datasets-inheritance' of github.com:pytorch/v…

435619d

…ision into pcam_mig

NicolasHug merged commit 27104fe into pytorch:prototype-datasets-inheritance Apr 6, 2022

NicolasHug added module: datasets prototype labels Apr 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Migrate PCAM prototype dataset #5745

Migrate PCAM prototype dataset #5745

Uh oh!

NicolasHug commented Apr 5, 2022

Uh oh!

pmeier Apr 5, 2022

Uh oh!

pmeier Apr 5, 2022

Uh oh!

NicolasHug Apr 6, 2022

Uh oh!

pmeier Apr 6, 2022

Uh oh!

NicolasHug Apr 6, 2022

Uh oh!

Uh oh!

Uh oh!

pmeier left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	if isinstance(categories, int):
	categories = [str(label) for label in range(categories)]

Migrate PCAM prototype dataset #5745

Migrate PCAM prototype dataset #5745

Uh oh!

Conversation

NicolasHug commented Apr 5, 2022

Uh oh!

pmeier Apr 5, 2022

Choose a reason for hiding this comment

Uh oh!

pmeier Apr 5, 2022

Choose a reason for hiding this comment

Uh oh!

NicolasHug Apr 6, 2022

Choose a reason for hiding this comment

Uh oh!

pmeier Apr 6, 2022

Choose a reason for hiding this comment

Uh oh!

NicolasHug Apr 6, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

pmeier left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants