Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UCF101 dataset more *loading* efficient #2475

Closed
wants to merge 5 commits into from

Conversation

Guillem96
Copy link
Contributor

Now when creating UCF101 dataset, you specify both the fold ({1, 2, 3}) and the split (train or test). Due to the current implementation, the video clip generation is agnostic to the selected fold and split, meaning that the VideoClips helper class processes all the videos first and afterwards, according to the selected split and fold, the clips are filtered.

This is very inefficient. For example, if you want to load only the test set, you have to go through around 900 videos while test set only contains around 350 videos.

To solve this I:

  1. Generate the dataset samples with make_dataset independently from split and fold.
  2. Obtain the videos' path list from annotation file corresponding to the selected split and fold
  3. Filter the dataset samples obtained in the step 1, so we only keep the samples that belong to the desired split and fold
  4. Process the videos with the VideoClips object.

Now the dataset is not working properly because of this line of code `indices = [i for i in range(len(video_list)) if video_list[i][len(self.root) + 1:] in selected_files]`. 
Performing the `len(self.root) + 1` only make sense if there is no training / to root

```
>>> root = 'data/ucf-101/videos'
>>> video_path = 'data/ucf-101/videos/activity/video.avi'
>>> video_path [len(root ):]
'/activity/video.avi'
>>> video_path [len(root ) + 1:]
'activity/video.avi'
```

Appending the root path also to the selected files is a simple solution and make the dataset works with and without a trailing slash.
@fmassa fmassa mentioned this pull request Jul 30, 2020
Copy link
Member

@fmassa fmassa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi,

Thanks for the PR and sorry for taking so much time to get back to you.

The PR looks good, there are just a few linter errors that should be fixed.

But before merging this, I would prefer that we add some tests to UCF101 (that @andfoy will be working on), so that we double-check that we are not missing anything.

@codecov
Copy link

codecov bot commented Jul 31, 2020

Codecov Report

Merging #2475 into master will increase coverage by 2.74%.
The diff coverage is 5.88%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2475      +/-   ##
==========================================
+ Coverage   68.83%   71.58%   +2.74%     
==========================================
  Files          94       94              
  Lines        7920     8861     +941     
  Branches     1249     1629     +380     
==========================================
+ Hits         5452     6343     +891     
+ Misses       2075     2052      -23     
- Partials      393      466      +73     
Impacted Files Coverage Δ
torchvision/datasets/ucf101.py 29.26% <5.88%> (+2.18%) ⬆️
torchvision/models/detection/backbone_utils.py 100.00% <0.00%> (ø)
torchvision/io/video.py 69.82% <0.00%> (+1.18%) ⬆️
torchvision/datasets/caltech.py 23.35% <0.00%> (+2.42%) ⬆️
torchvision/datasets/celeba.py 21.18% <0.00%> (+2.43%) ⬆️
torchvision/models/detection/rpn.py 93.13% <0.00%> (+2.57%) ⬆️
torchvision/transforms/functional_pil.py 64.63% <0.00%> (+3.09%) ⬆️
torchvision/datasets/utils.py 59.29% <0.00%> (+3.11%) ⬆️
torchvision/transforms/functional.py 84.05% <0.00%> (+3.12%) ⬆️
torchvision/transforms/transforms.py 80.06% <0.00%> (+4.02%) ⬆️
... and 10 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0344603...01c1f7e. Read the comment docs.

Copy link

@pevogam pevogam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good idea to speed this up!

Copy link
Member

@fmassa fmassa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for this PR!

Unfortunately we can't merge this as is because it will break backwards-compatibility with downstream projects that depend on torchvision.

While this PR is definitely an improvement compared to what we had before, we would need to check with ClassyVision team cc @stephenyan1231 @vreis about this change, as it would require a change on ClassyVision side as well.

self.indices = self._select_fold(video_list, annotation_path, fold, train)
self.video_clips = video_clips.subset(self.indices)
self.transform = transform
self.video_clips_metadata = self.video_clips.metadata
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unfortunately a BC-breaking change as other downstream projects rely on this behavior, see ClassyVision for example.

The original thinking behind this approach was that one could cache the metadata and re-use it over different dataset invocations, so that the creation time would be amortized.

It could indeed be possible to from the beginning create separate metadata for each fold, but now that it's been done as is we unfortunately to keep it for backwards-compatibility reasons.

@vreis
Copy link

vreis commented Aug 4, 2020

@stephenyan1231: I'll defer to you on this one. Seems like a good change, and in general I don't mind breaking changes. But I don't know if we rely on this behavior in subtle ways.

@Guillem96
Copy link
Contributor Author

Any news on this?

@yassineAlouini
Copy link
Contributor

Thanks @Guillem96 for this contribution and sorry for taking that long to get an update.

As you might have read, the dataset API is being reworked and datasets are being moved to the new API (as detailed here).

Not yet sure what is the best course of action but here are two possible paths:

  • add some tests to the current implementation and update it to make sure that it is backward compatible (maybe this will be hard/impossible to do?) then migrate to the new dataset API. The migration can be done by someone else of course and due credit will be given to you @Guillem96. 👌
  • first migrate the dataset to the new dataset API (again can be done by someone else) and then integrate the suggested code while making sure to keep the code backward compatible (if that's possible).

In the two paths, I would suggest having two separate PRs so that the work is easier to review.

What works best for you @Guillem96? Feel free if you have questions and/or other suggestions.

Any other suggestions @pmeier on this?

@pmeier
Copy link
Collaborator

pmeier commented May 2, 2022

We are currently trying to figure out the best way to do this in #5422 cc @bjuncek. The new design will not re-use the VideoClips object as we will not be able to index the whole dataset upfront. Thus, I don't think we need to keep this open. @Guillem96 are you ok with closing this and following the development in the new PR?

@Guillem96
Copy link
Contributor Author

@pmeier fine with me! Thanks for your comments👌🏼

@Guillem96 Guillem96 closed this May 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants