Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow test ratio to zero in WildFireSplitter #63

Merged
merged 6 commits into from
May 19, 2020

Conversation

MateoLostanlen
Copy link
Member

This PR aims to answer the issue : [DataSet] Allow test ratio to zero in WildFireSplitter #61

As explain in the issue :

At the moment the input ratios of WildFireSplitter cannot be equal to zero. I would like to add this option. Indeed we are going to do a lot of tests with different extraction methods (number of frames, spacing between frames ...). As the frame extraction is long I think we will save a lot of time if we don't have to do it every time for the test set.

PS: I have also commented the warinig

"Labeling. Maybe try to label the fire again
Distinct values of ids(340 ≠ 420)
f" ≠ {wildfire.metadata['fire_id'].max() + 1})", Warning)"

We haven't annotated the entire Dataset so we're going to get this warning every time.

@MateoLostanlen MateoLostanlen linked an issue May 1, 2020 that may be closed by this pull request
@codecov
Copy link

codecov bot commented May 2, 2020

Codecov Report

Merging #63 into master will increase coverage by 0.13%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #63      +/-   ##
==========================================
+ Coverage   83.85%   83.98%   +0.13%     
==========================================
  Files          19       19              
  Lines         706      712       +6     
==========================================
+ Hits          592      598       +6     
  Misses        114      114              
Impacted Files Coverage Δ
pyronear/datasets/wildfire/split_strategy.py 97.29% <100.00%> (+0.52%) ⬆️

@frgfm frgfm added type: enhancement New feature or request module: datasets Related to datasets labels May 12, 2020
@frgfm frgfm added this to the 0.1.1 milestone May 12, 2020
@frgfm
Copy link
Member

frgfm commented May 12, 2020

Hi @MateoLostanlen,

Thanks for the PR! Is this ready for review?
If so, add some reviewers to it or we don't get notified :)

@MateoLostanlen
Copy link
Member Author

Yes it's ready :)

@MateoLostanlen MateoLostanlen requested review from x0s and frgfm May 12, 2020 09:04
Copy link
Member

@frgfm frgfm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot! Would you mind providing us in #61 with the exact commands you used to produce the error this PR is addressing please?

The commented section might removed, and I wrote a small question about the test precision. Let me know what you think!

test/test_datasets_wildfire_split.py Show resolved Hide resolved
Comment on lines 161 to 164
# if wildfire.metadata['fire_id'].nunique() != wildfire.metadata['fire_id'].max() + 1:
# warnings.warn(f"Inconsistent Fire Labeling. Maybe try to label the fire again\n"
# f"Distinct values of ids({wildfire.metadata['fire_id'].nunique()}"
# f" ≠ {wildfire.metadata['fire_id'].max() + 1})", Warning)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If commented, do we need to keep this section at all?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This warning is here for some reason. Splitting this dataset suppose firelabeling was correctly made.
If it wasn't, a warning is raised to warn the user. It could be difficult to see it from outside the function.

Besides, If you want to silence warnings when using this code in your script/notebook in case too much information displayed bother you, you can do it this way: (no need to silence warnings for everyone)

import warnings
warnings.filterwarnings("ignore")

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me a warning that is raised under any circumstances doesn't really make sense. We can eventually put it back on if we annotate the entire dataset. We can also modify it by considering what was actually annotated

Copy link
Contributor

@x0s x0s May 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, It does raise a warning under any circumstances corresponding to the same misuse of metadata.

If you instantiate a WildFireDataset object with a metadata file, it should cover exactly the values of this dataset, not more not less. Here a warning is raised because such inconsistency was detected.It is important, because this a a responsibility for the splitter not to mess the fire ids.

Also, the count of frames in the directory used to init WildFireDataset does not prevail on the metadata to give information on the dataset (actually, this is the role of metadata to do so).
This is even more dangerous because using a metadata not reflecting the data, give wrong length of WildFireDataset ! so we can expect that some splitting strategies may process the dataset in unexpected ways !
Following this conversation, I think precising this in the documentation and testing metadata consistency over the data it covered in WildFireDataset may help us.

So, If you work on a subset of Wildfire today, it may be better working with a subset of the metadata.csv, and it should work well. At some point you should have made this distinction.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I'm sorry, I think I don't understand something. I'm gonna explain my thinking to you to see if I'm missing something. I added the FireLabels to my dataset, the fires are labeled from the names of the 1020 videos, which gives 420 different fires, which is what we find in the warning:

wildfire.metadata['fire_id'].max()

But since we didn't annotate all the dataset the number of fireid actually present in the annotations is 340. wildfire.metadata['fire_id'].nunique()
What raised the warning:

Distinct values of ids(340 ≠ 420)

Am I using the fireLabeler in the wrong way? Should I have deleted the unlabeled videos from my video list?

Copy link
Contributor

@x0s x0s May 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for detailing your workflow, I understand now more clearly what you're trying to achieve.

I recommend to only work with the data available. Checks should pass and it would be more convenient for you to work. Here is a suggestion.

Let's see the flow from the videos to the WildFireDataset

At beginning, we can have:

  • all videos in a directory
  • checkedstates covering only annotated data
  • metadata covering all videos <-- this needs to be changed to cover only annotated data

So if you, create a new metadata.csv from the original(exhaustive) one, you tell the other objects (WildFireDataset, FireLabeler, WildFiresplitter...) the videos you are going to use.
For instance, using a merge between checkstates and original metadata.

The role of metadata is to provide information of the data (videos/frames) you need to process. The other objects will process data base on these information, so that's important metadata only contain what's needed/available.
Concerning the videos/frames, you shouldn't need to subselect what's needed/delete unannotated videos because the objects(splitter...) will be able to filter based on metadata.csv.

Of course, we can improve this process. It is very valuable to have your feedback, tell me if this helps.
Thanks,

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I'll work on my workflow. I'll put the warning back on. Thanks

pyronear/datasets/wildfire/split_strategy.py Show resolved Hide resolved
Copy link
Member

@frgfm frgfm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! Looks good to me!

@MateoLostanlen MateoLostanlen merged commit 065286d into pyronear:master May 19, 2020
MateoLostanlen added a commit to MateoLostanlen/pyro-vision that referenced this pull request May 22, 2020
MateoLostanlen added a commit to MateoLostanlen/pyro-vision that referenced this pull request May 22, 2020
MateoLostanlen added a commit to MateoLostanlen/pyro-vision that referenced this pull request May 22, 2020
MateoLostanlen added a commit to MateoLostanlen/pyro-vision that referenced this pull request May 22, 2020
MateoLostanlen added a commit that referenced this pull request May 24, 2020
* Allow test ratio to zero in WildFireSplitter

* fix flake8

* add training script for Wildfire

* add a test to improve code coverage

* put back the warning on FireIds

* Fix size in RandomCrop and delete CenterCrop

* add centercrop to get square images

* remove change already made in PR #63
Akilditu pushed a commit to Akilditu/seb_pyro-vision that referenced this pull request Jun 3, 2020
* Allow test ratio to zero in WildFireSplitter

* fix flake8

* add a test to improve code coverage

* fix test

* put back the warning on FireIds

* fix style
Akilditu pushed a commit to Akilditu/seb_pyro-vision that referenced this pull request Jun 3, 2020
* Allow test ratio to zero in WildFireSplitter

* fix flake8

* add training script for Wildfire

* add a test to improve code coverage

* put back the warning on FireIds

* Fix size in RandomCrop and delete CenterCrop

* add centercrop to get square images

* remove change already made in PR pyronear#63
MateoLostanlen added a commit to MateoLostanlen/pyro-vision that referenced this pull request Jun 28, 2020
MateoLostanlen added a commit to MateoLostanlen/pyro-vision that referenced this pull request Jun 28, 2020
MateoLostanlen added a commit to MateoLostanlen/pyro-vision that referenced this pull request Jun 28, 2020
@MateoLostanlen MateoLostanlen deleted the allowTestRatioToZero branch September 18, 2020 17:07
MateoLostanlen added a commit that referenced this pull request Sep 18, 2020
* Allow test ratio to zero in WildFireSplitter

* fix flake8

* add a test to improve code coverage

* fix test

* Add SubSampler DataSet

* add tests for SubSampler dataset

* add SSresnet18 model

* clean code in subsampler dataset

* Add test for SSResNet18

* Update SSresnet18 for a better combinaition of frames

* use of .to instead of .cuda

* remove useless if

* remove change already made in PR #63

* remove change already made in PR #63 2

* remove test with test ratio equal zero, impossible without #63

* Create an other PR to add the ssresnet18

* Allow test ratio to zero in WildFireSplitter

* fix flake8

* add a test to improve code coverage

* fix test

* Add SubSampler DataSet

* add tests for SubSampler dataset

* add SSresnet18 model

* clean code in subsampler dataset

* Add test for SSResNet18

* Update SSresnet18 for a better combinaition of frames

* use of .to instead of .cuda

* remove useless if

* remove change already made in PR #63

* remove change already made in PR #63 2

* remove test with test ratio equal zero, impossible without #63

* Create an other PR to add the ssresnet18

* bring subsampler in wildfire

* update branch

* add license header

* replace random per SystemRandom

* fix license header

* put back import unitest

* put back import random for shuffle

* define subSampler as a function

* allow to get metadata from path

* update tests

* remove useless import

* reproduce random error

* put back SystemRandom

* add seed for reproducibility

* fix typo and class name

* add seed to docstring

* update main.yml
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: datasets Related to datasets type: enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[DataSet] Allow test ratio to zero in WildFireSplitter
3 participants