Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix prototype datasets data loading tests #5711

Merged
merged 14 commits into from
Apr 5, 2022

Conversation

pmeier
Copy link
Collaborator

@pmeier pmeier commented Mar 31, 2022

This re-enables the serialization test after pytorch/pytorch#74958. Test will fail for next ~3 hours until the other PR is part of the PyTorch core nightlies.

Edit: The PR evolved to something more. Instead of only serialization, we now also test traversability as well loading through a DataLoader. Together with #5723 we should now have sufficient coverage for our datasets to not accidentally break data loading workflow.

@facebook-github-bot
Copy link

facebook-github-bot commented Mar 31, 2022

💊 CI failures summary and remediations

As of commit f9b682c (more details on the Dr. CI page):


  • 3/3 failures introduced in this PR

🕵️ 2 new failures recognized by patterns

The following CI failures do not appear to be due to upstream breakages

See CircleCI build unittest_windows_cpu_py3.9 (1/2)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

test/test_utils.py::test_flow_to_image_errors[i...Flow should be of dtype torch.float] PASSED [ 98%]
test/test_utils.py::test_draw_keypoints_colored[red] PASSED              [ 98%]
test/test_utils.py::test_draw_keypoints_colored[#FF00FF] PASSED          [ 98%]
test/test_utils.py::test_draw_keypoints_colored[colors2] PASSED          [ 98%]
test/test_utils.py::test_draw_keypoints_errors PASSED                    [ 98%]
test/test_utils.py::test_flow_to_image[True] PASSED                      [ 98%]
test/test_utils.py::test_flow_to_image[False] PASSED                     [ 98%]
test/test_utils.py::test_flow_to_image_errors[input_flow0-Input flow should have shape] PASSED [ 98%]
test/test_utils.py::test_flow_to_image_errors[input_flow1-Input flow should have shape] PASSED [ 98%]
test/test_utils.py::test_flow_to_image_errors[input_flow2-Input flow should have shape] PASSED [ 98%]
test/test_utils.py::test_flow_to_image_errors[input_flow3-Input flow should have shape] PASSED [ 98%]
test/test_utils.py::test_flow_to_image_errors[input_flow4-Flow should be of dtype torch.float] PASSED [ 98%]
test/test_video_gpu_decoder.py::TestVideoGPUDecoder::test_frame_reading[RATRACE_wave_f_nm_np1_fr_goo_37.avi] SKIPPED [ 98%]
test/test_video_gpu_decoder.py::TestVideoGPUDecoder::test_frame_reading[TrumanShow_wave_f_nm_np1_fr_med_26.avi] SKIPPED [ 98%]
test/test_video_gpu_decoder.py::TestVideoGPUDecoder::test_frame_reading[v_SoccerJuggling_g23_c01.avi] SKIPPED [ 98%]
test/test_video_gpu_decoder.py::TestVideoGPUDecoder::test_frame_reading[v_SoccerJuggling_g24_c01.avi] SKIPPED [ 98%]
test/test_video_gpu_decoder.py::TestVideoGPUDecoder::test_frame_reading[R6llTwEh07w.mp4] SKIPPED [ 98%]
test/test_video_gpu_decoder.py::TestVideoGPUDecoder::test_frame_reading[SOX5yA1l24A.mp4] SKIPPED [ 98%]
test/test_video_gpu_decoder.py::TestVideoGPUDecoder::test_frame_reading[WUzgd7C1pWA.mp4] SKIPPED [ 98%]
test/test_video_gpu_decoder.py::TestVideoGPUDecoder::test_seek_reading[C:\\Users\\circleci\\project\\test\\assets\\videos\\v_SoccerJuggling_g23_c01.avi-8.0-True] SKIPPED [ 98%]
test/test_video_gpu_decoder.py::TestVideoGPUDecoder::test_seek_reading[C:\\Users\\circleci\\project\\test\\assets\\videos\\v_SoccerJuggling_g23_c01.avi-8.0-False] SKIPPED [ 98%]
test/test_video_gpu_decoder.py::TestVideoGPUDecoder::test_seek_reading[C:\\Users\\circleci\\project\\test\\assets\\videos\\v_SoccerJuggling_g24_c01.avi-8.0-True] SKIPPED [ 98%]

See CircleCI build unittest_windows_cpu_py3.8 (2/2)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

test/test_utils.py::test_flow_to_image_errors[i...Flow should be of dtype torch.float] PASSED [ 98%]
test/test_utils.py::test_draw_keypoints_colored[red] PASSED              [ 98%]
test/test_utils.py::test_draw_keypoints_colored[#FF00FF] PASSED          [ 98%]
test/test_utils.py::test_draw_keypoints_colored[colors2] PASSED          [ 98%]
test/test_utils.py::test_draw_keypoints_errors PASSED                    [ 98%]
test/test_utils.py::test_flow_to_image[True] PASSED                      [ 98%]
test/test_utils.py::test_flow_to_image[False] PASSED                     [ 98%]
test/test_utils.py::test_flow_to_image_errors[input_flow0-Input flow should have shape] PASSED [ 98%]
test/test_utils.py::test_flow_to_image_errors[input_flow1-Input flow should have shape] PASSED [ 98%]
test/test_utils.py::test_flow_to_image_errors[input_flow2-Input flow should have shape] PASSED [ 98%]
test/test_utils.py::test_flow_to_image_errors[input_flow3-Input flow should have shape] PASSED [ 98%]
test/test_utils.py::test_flow_to_image_errors[input_flow4-Flow should be of dtype torch.float] PASSED [ 98%]
test/test_video_gpu_decoder.py::TestVideoGPUDecoder::test_frame_reading[RATRACE_wave_f_nm_np1_fr_goo_37.avi] SKIPPED [ 98%]
test/test_video_gpu_decoder.py::TestVideoGPUDecoder::test_frame_reading[TrumanShow_wave_f_nm_np1_fr_med_26.avi] SKIPPED [ 98%]
test/test_video_gpu_decoder.py::TestVideoGPUDecoder::test_frame_reading[v_SoccerJuggling_g23_c01.avi] SKIPPED [ 98%]
test/test_video_gpu_decoder.py::TestVideoGPUDecoder::test_frame_reading[v_SoccerJuggling_g24_c01.avi] SKIPPED [ 98%]
test/test_video_gpu_decoder.py::TestVideoGPUDecoder::test_frame_reading[R6llTwEh07w.mp4] SKIPPED [ 98%]
test/test_video_gpu_decoder.py::TestVideoGPUDecoder::test_frame_reading[SOX5yA1l24A.mp4] SKIPPED [ 98%]
test/test_video_gpu_decoder.py::TestVideoGPUDecoder::test_frame_reading[WUzgd7C1pWA.mp4] SKIPPED [ 98%]
test/test_video_gpu_decoder.py::TestVideoGPUDecoder::test_seek_reading[C:\\Users\\circleci\\project\\test\\assets\\videos\\v_SoccerJuggling_g23_c01.avi-8.0-True] SKIPPED [ 98%]
test/test_video_gpu_decoder.py::TestVideoGPUDecoder::test_seek_reading[C:\\Users\\circleci\\project\\test\\assets\\videos\\v_SoccerJuggling_g23_c01.avi-8.0-False] SKIPPED [ 98%]
test/test_video_gpu_decoder.py::TestVideoGPUDecoder::test_seek_reading[C:\\Users\\circleci\\project\\test\\assets\\videos\\v_SoccerJuggling_g24_c01.avi-8.0-True] SKIPPED [ 98%]

🕵️‍♀️ 1 failure not recognized by patterns:

The following CI failures may be due to changes from the PR
Job Step Action
CircleCI unittest_prototype Run tests 🔁 rerun

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

.circleci/config.yml.in Outdated Show resolved Hide resolved
Comment on lines 115 to 116
if DILL_AVAILABLE:
mocker.patch("torch.utils.data.datapipes.datapipe.DILL_AVAILABLE", new=False)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ejguan @NivekT since the serialization backend is automatically selected based on the presence of dill, it is impossible to test pickle serialization without patching this. For now we only need to patch a single module, but as pytorch/pytorch#74958 (comment) implies, we need to do this in multiple places in the future.

Would it be possible give users the option to set the serialization backend? The default can still be dill if available otherwise pickle.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I can confirm that we will rely on this DILL_AVAILABLE for any place in TorchData project to determine if dill is available or not.

Would it be possible give users the option to set the serialization backend? The default can still be dill if available otherwise pickle.

It's doable, but I am not sure if we want to do so because the goal of automatically using dill is to reduce the work users need to figure out if the DataPipe is serializable with lambda function.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I can confirm that we will rely on this DILL_AVAILABLE for any place in TorchData project to determine if dill is available or not.

The problem is that we will need to patch every single module where this is imported. You cannot patch the place where it is defined, but rather where it is used. If you look above, we are not patching ._utils.serialization but rather .datapipes.datapipe, because this is where the flag is used. If we now need use this flag in multiple modules, we need to patch all of them. This is very brittle.

It's doable, but I am not sure if we want to do so because the goal of automatically using dill is to reduce the work users need to figure out if the DataPipe is serializable with lambda function.

Not sure I understand. If we just keep the same detection as we have now, users that don't care should not see any difference. If dill is available, it will be picked up and otherwise pickle will be used. But it would give users the option to enforce a particular backend if they need to. Without this option, the environment you use has an effect on the functionality and there is no way change that. I don't think this is good design.

Even if you don't do it for the users, think about how you want to test pickle vs dill yourself. Right now the only option is to have two separate workflows one with dill installed and one without.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that we will need to patch every single module where this is imported. You cannot patch the place where it is defined, but rather where it is used. If you look above, we are not patching ._utils.serialization but rather .datapipes.datapipe, because this is where the flag is used. If we now need use this flag in multiple modules, we need to patch all of them. This is very brittle.

You still can add following code to override the method rather than using patch.

def state_fn(self):
    return self.__dict__
IterDataPipe. set_getstate_hook(state_fn)

But it would give users the option to enforce a particular backend if they need to. Without this option, the environment you use has an effect on the functionality and there is no way change that. I don't think this is good design.

This is actually a good argument for users who have fully understanding about what they want to achieve. We may be able to expose an API to switch backend if needed, similar to the set_getstate_hook but with syntax sugar.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding an issue pytorch/data#341

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc: @NivekT Overriding set_getstate_hook with the above function won't actually work for all DataPipe like Forker https://github.com/pytorch/pytorch/blob/835cc66e5dd26db558931b4fe47b45e08a3a09f7/torch/utils/data/datapipes/iter/combining.py#L158-L167

Then, we should definitely support backend switch that Philip suggested

@pmeier
Copy link
Collaborator Author

pmeier commented Mar 31, 2022

Even with the patch in #5711 (comment), the split="val" configuration fails due to pytorch/pytorch#74958 (comment).

Copy link
Collaborator Author

@pmeier pmeier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two distinct failures that will be eliminated by pytorch/pytorch#75034:

  • traverse is not guarded against infinite recursion loops
  • split="val" of ImageNet contains a Demultiplexer, which is also not guarded against infinite recursion loops

Instead of xfail'ing the tests now let's just wait for the other PR to get merged.

test/test_prototype_builtin_datasets.py Show resolved Hide resolved
@pmeier
Copy link
Collaborator Author

pmeier commented Apr 4, 2022

This PR is a sister to #5723, which adds a data loader test for DDP.

Copy link
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @pmeier , minor questions but LGTM.

test/test_prototype_builtin_datasets.py Outdated Show resolved Hide resolved
test/test_prototype_builtin_datasets.py Outdated Show resolved Hide resolved
@pmeier pmeier changed the title re-enable serialization test fix prototype datasets data loading tests Apr 4, 2022
@pmeier
Copy link
Collaborator Author

pmeier commented Apr 5, 2022

CI is happy with today's torch / torchdata nightlies.

@pmeier pmeier merged commit 9f12ef4 into pytorch:prototype-datasets-inheritance Apr 5, 2022
@pmeier pmeier deleted the serialization branch April 5, 2022 13:11
NicolasHug added a commit that referenced this pull request Apr 7, 2022
* refactor prototype datasets to inherit from IterDataPipe (#5448)

* refactor prototype datasets to inherit from IterDataPipe

* depend on new architecture

* fix missing file detection

* remove unrelated file

* reinstante decorator for mock registering

* options -> config

* remove passing of info to mock data functions

* refactor categories file generation

* fix imagenet

* fix prototype datasets data loading tests (#5711)

* reenable serialization test

* cleanup

* fix dill test

* trigger CI

* patch DILL_AVAILABLE for pickle serialization

* revert CI changes

* remove dill test and traversable test

* add data loader test

* parametrize over only_datapipe

* draw one sample rather than exhaust data loader

* cleanup

* trigger CI

* migrate VOC prototype dataset (#5743)

* migrate VOC prototype dataset

* cleanup

* revert unrelated mock data changes

* remove categories annotations

* move properties to constructor

* readd homepage

* migrate CIFAR prototype datasets (#5751)

* migrate country211 prototype dataset (#5753)

* migrate CLEVR prototype datsaet (#5752)

* migrate coco prototype (#5473)

* migrate coco prototype

* revert unrelated change

* add kwargs to super constructor call

* remove unneeded changes

* fix docstring position

* make kwargs explicit

* add dependencies to docstring

* fix missing dependency message

* Migrate PCAM prototype dataset (#5745)

* Port PCAM

* skip_integrity_check

* Update torchvision/prototype/datasets/_builtin/pcam.py

Co-authored-by: Philip Meier <github.pmeier@posteo.de>

* Address comments

Co-authored-by: Philip Meier <github.pmeier@posteo.de>

* Migrate DTD prototype dataset (#5757)

* Migrate DTD prototype dataset

* Docstring

* Apply suggestions from code review

Co-authored-by: Philip Meier <github.pmeier@posteo.de>

Co-authored-by: Philip Meier <github.pmeier@posteo.de>

* Migrate GTSRB prototype dataset (#5746)

* Migrate GTSRB prototype dataset

* ufmt

* Address comments

* Apparently mypy doesn't know that __len__ returns ints. How cute.

* why is the CI not triggered??

* Update torchvision/prototype/datasets/_builtin/gtsrb.py

Co-authored-by: Philip Meier <github.pmeier@posteo.de>

Co-authored-by: Philip Meier <github.pmeier@posteo.de>

* migrate CelebA prototype dataset (#5750)

* migrate CelebA prototype dataset

* inline split_id

* Migrate Food101 prototype dataset (#5758)

* Migrate Food101 dataset

* Added length

* Update torchvision/prototype/datasets/_builtin/food101.py

Co-authored-by: Philip Meier <github.pmeier@posteo.de>

Co-authored-by: Philip Meier <github.pmeier@posteo.de>

* Migrate Fer2013 prototype dataset (#5759)

* Migrate Fer2013 prototype dataset

* Update torchvision/prototype/datasets/_builtin/fer2013.py

Co-authored-by: Philip Meier <github.pmeier@posteo.de>

Co-authored-by: Philip Meier <github.pmeier@posteo.de>

* Migrate EuroSAT prototype dataset (#5760)

* Migrate Semeion prototype dataset (#5761)

* migrate caltech prototype datasets (#5749)

* migrate caltech prototype datasets

* resolve third party dependencies

* Migrate Oxford Pets prototype dataset (#5764)

* Migrate Oxford Pets prototype dataset

* Update torchvision/prototype/datasets/_builtin/oxford_iiit_pet.py

Co-authored-by: Philip Meier <github.pmeier@posteo.de>

Co-authored-by: Philip Meier <github.pmeier@posteo.de>

* migrate mnist prototype datasets (#5480)

* migrate MNIST prototype datasets

* Update torchvision/prototype/datasets/_builtin/mnist.py

Co-authored-by: Nicolas Hug <contact@nicolas-hug.com>

Co-authored-by: Nicolas Hug <contact@nicolas-hug.com>

* Migrate Stanford Cars prototype dataset (#5767)

* Migrate Stanford Cars prototype dataset

* Address comments

* fix category file generation (#5770)

* fix category file generation

* revert unrelated change

* revert unrelated change

* migrate cub200 prototype dataset (#5765)

* migrate cub200 prototype dataset

* address comments

* fix category-file-generation

* Migrate USPS prototype dataset (#5771)

* migrate SBD prototype dataset (#5772)

* migrate SBD prototype dataset

* reuse categories

* Migrate SVHN prototype dataset (#5769)

* add test to enforce __len__ is working on prototype datasets (#5742)

* reactivate special dataset tests

* add missing annotation

* Cleanup prototype dataset implementation (#5774)

* Remove Dataset2 class

* Move read_categories_file out of DatasetInfo

* Remove FrozenBunch and FrozenMapping

* Remove test_prototype_datasets_api.py and move missing dep test somewhere else

* ufmt

* Let read_categories_file accept names instead of paths

* Mypy

* flake8

* fix category file reading

Co-authored-by: Philip Meier <github.pmeier@posteo.de>

* update prototype dataset README (#5777)

* update prototype dataset README

* fix header level

* Apply suggestions from code review

Co-authored-by: Nicolas Hug <contact@nicolas-hug.com>

Co-authored-by: Nicolas Hug <contact@nicolas-hug.com>

Co-authored-by: Nicolas Hug <contact@nicolas-hug.com>
facebook-github-bot pushed a commit that referenced this pull request May 5, 2022
Summary:
* refactor prototype datasets to inherit from IterDataPipe (#5448)

* refactor prototype datasets to inherit from IterDataPipe

* depend on new architecture

* fix missing file detection

* remove unrelated file

* reinstante decorator for mock registering

* options -> config

* remove passing of info to mock data functions

* refactor categories file generation

* fix imagenet

* fix prototype datasets data loading tests (#5711)

* reenable serialization test

* cleanup

* fix dill test

* trigger CI

* patch DILL_AVAILABLE for pickle serialization

* revert CI changes

* remove dill test and traversable test

* add data loader test

* parametrize over only_datapipe

* draw one sample rather than exhaust data loader

* cleanup

* trigger CI

* migrate VOC prototype dataset (#5743)

* migrate VOC prototype dataset

* cleanup

* revert unrelated mock data changes

* remove categories annotations

* move properties to constructor

* readd homepage

* migrate CIFAR prototype datasets (#5751)

* migrate country211 prototype dataset (#5753)

* migrate CLEVR prototype datsaet (#5752)

* migrate coco prototype (#5473)

* migrate coco prototype

* revert unrelated change

* add kwargs to super constructor call

* remove unneeded changes

* fix docstring position

* make kwargs explicit

* add dependencies to docstring

* fix missing dependency message

* Migrate PCAM prototype dataset (#5745)

* Port PCAM

* skip_integrity_check

* Update torchvision/prototype/datasets/_builtin/pcam.py

* Address comments

* Migrate DTD prototype dataset (#5757)

* Migrate DTD prototype dataset

* Docstring

* Apply suggestions from code review

* Migrate GTSRB prototype dataset (#5746)

* Migrate GTSRB prototype dataset

* ufmt

* Address comments

* Apparently mypy doesn't know that __len__ returns ints. How cute.

* why is the CI not triggered??

* Update torchvision/prototype/datasets/_builtin/gtsrb.py

* migrate CelebA prototype dataset (#5750)

* migrate CelebA prototype dataset

* inline split_id

* Migrate Food101 prototype dataset (#5758)

* Migrate Food101 dataset

* Added length

* Update torchvision/prototype/datasets/_builtin/food101.py

* Migrate Fer2013 prototype dataset (#5759)

* Migrate Fer2013 prototype dataset

* Update torchvision/prototype/datasets/_builtin/fer2013.py

* Migrate EuroSAT prototype dataset (#5760)

* Migrate Semeion prototype dataset (#5761)

* migrate caltech prototype datasets (#5749)

* migrate caltech prototype datasets

* resolve third party dependencies

* Migrate Oxford Pets prototype dataset (#5764)

* Migrate Oxford Pets prototype dataset

* Update torchvision/prototype/datasets/_builtin/oxford_iiit_pet.py

* migrate mnist prototype datasets (#5480)

* migrate MNIST prototype datasets

* Update torchvision/prototype/datasets/_builtin/mnist.py

* Migrate Stanford Cars prototype dataset (#5767)

* Migrate Stanford Cars prototype dataset

* Address comments

* fix category file generation (#5770)

* fix category file generation

* revert unrelated change

* revert unrelated change

* migrate cub200 prototype dataset (#5765)

* migrate cub200 prototype dataset

* address comments

* fix category-file-generation

* Migrate USPS prototype dataset (#5771)

* migrate SBD prototype dataset (#5772)

* migrate SBD prototype dataset

* reuse categories

* Migrate SVHN prototype dataset (#5769)

* add test to enforce __len__ is working on prototype datasets (#5742)

* reactivate special dataset tests

* add missing annotation

* Cleanup prototype dataset implementation (#5774)

* Remove Dataset2 class

* Move read_categories_file out of DatasetInfo

* Remove FrozenBunch and FrozenMapping

* Remove test_prototype_datasets_api.py and move missing dep test somewhere else

* ufmt

* Let read_categories_file accept names instead of paths

* Mypy

* flake8

* fix category file reading

* update prototype dataset README (#5777)

* update prototype dataset README

* fix header level

* Apply suggestions from code review

(Note: this ignores all push blocking failures!)

Reviewed By: jdsgomes, NicolasHug

Differential Revision: D36095693

fbshipit-source-id: d57f2b4a89ef1c45f5e2783ea57bce08df5c561d

Co-authored-by: Philip Meier <github.pmeier@posteo.de>
Co-authored-by: Philip Meier <github.pmeier@posteo.de>
Co-authored-by: Philip Meier <github.pmeier@posteo.de>
Co-authored-by: Philip Meier <github.pmeier@posteo.de>
Co-authored-by: Philip Meier <github.pmeier@posteo.de>
Co-authored-by: Philip Meier <github.pmeier@posteo.de>
Co-authored-by: Philip Meier <github.pmeier@posteo.de>
Co-authored-by: Philip Meier <github.pmeier@posteo.de>
Co-authored-by: Philip Meier <github.pmeier@posteo.de>
Co-authored-by: Philip Meier <github.pmeier@posteo.de>
Co-authored-by: Philip Meier <github.pmeier@posteo.de>
Co-authored-by: Philip Meier <github.pmeier@posteo.de>
Co-authored-by: Nicolas Hug <contact@nicolas-hug.com>
Co-authored-by: Nicolas Hug <contact@nicolas-hug.com>
Co-authored-by: Philip Meier <github.pmeier@posteo.de>
Co-authored-by: Nicolas Hug <contact@nicolas-hug.com>
Co-authored-by: Nicolas Hug <contact@nicolas-hug.com>
Co-authored-by: Nicolas Hug <contact@nicolas-hug.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants