Allow test ratio to zero in WildFireSplitter #63

MateoLostanlen · 2020-05-01T14:56:28Z

This PR aims to answer the issue : [DataSet] Allow test ratio to zero in WildFireSplitter #61

As explain in the issue :

At the moment the input ratios of WildFireSplitter cannot be equal to zero. I would like to add this option. Indeed we are going to do a lot of tests with different extraction methods (number of frames, spacing between frames ...). As the frame extraction is long I think we will save a lot of time if we don't have to do it every time for the test set.

PS: I have also commented the warinig

"Labeling. Maybe try to label the fire again
Distinct values of ids(340 ≠ 420)
f" ≠ {wildfire.metadata['fire_id'].max() + 1})", Warning)"

We haven't annotated the entire Dataset so we're going to get this warning every time.

codecov · 2020-05-02T06:24:19Z

Codecov Report

Merging #63 into master will increase coverage by 0.13%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master      #63      +/-   ##
==========================================
+ Coverage   83.85%   83.98%   +0.13%     
==========================================
  Files          19       19              
  Lines         706      712       +6     
==========================================
+ Hits          592      598       +6     
  Misses        114      114

Impacted Files	Coverage Δ
pyronear/datasets/wildfire/split_strategy.py	`97.29% <100.00%> (+0.52%)`	⬆️

frgfm · 2020-05-12T08:58:26Z

Hi @MateoLostanlen,

Thanks for the PR! Is this ready for review?
If so, add some reviewers to it or we don't get notified :)

MateoLostanlen · 2020-05-12T09:03:28Z

Yes it's ready :)

frgfm

Thanks a lot! Would you mind providing us in #61 with the exact commands you used to produce the error this PR is addressing please?

The commented section might removed, and I wrote a small question about the test precision. Let me know what you think!

test/test_datasets_wildfire_split.py

frgfm · 2020-05-12T10:13:06Z

pyronear/datasets/wildfire/wildfire.py

+        # if wildfire.metadata['fire_id'].nunique() != wildfire.metadata['fire_id'].max() + 1:
+        #     warnings.warn(f"Inconsistent Fire Labeling. Maybe try to label the fire again\n"
+        #                   f"Distinct values of ids({wildfire.metadata['fire_id'].nunique()}"
+        #                   f" ≠ {wildfire.metadata['fire_id'].max() + 1})", Warning)


If commented, do we need to keep this section at all?

This warning is here for some reason. Splitting this dataset suppose firelabeling was correctly made.
If it wasn't, a warning is raised to warn the user. It could be difficult to see it from outside the function.

Besides, If you want to silence warnings when using this code in your script/notebook in case too much information displayed bother you, you can do it this way: (no need to silence warnings for everyone)

import warnings warnings.filterwarnings("ignore")

For me a warning that is raised under any circumstances doesn't really make sense. We can eventually put it back on if we annotate the entire dataset. We can also modify it by considering what was actually annotated

Indeed, It does raise a warning under any circumstances corresponding to the same misuse of metadata.

If you instantiate a WildFireDataset object with a metadata file, it should cover exactly the values of this dataset, not more not less. Here a warning is raised because such inconsistency was detected.It is important, because this a a responsibility for the splitter not to mess the fire ids.

Also, the count of frames in the directory used to init WildFireDataset does not prevail on the metadata to give information on the dataset (actually, this is the role of metadata to do so).
This is even more dangerous because using a metadata not reflecting the data, give wrong length of WildFireDataset ! so we can expect that some splitting strategies may process the dataset in unexpected ways !
Following this conversation, I think precising this in the documentation and testing metadata consistency over the data it covered in WildFireDataset may help us.

So, If you work on a subset of Wildfire today, it may be better working with a subset of the metadata.csv, and it should work well. At some point you should have made this distinction.

Okay, I'm sorry, I think I don't understand something. I'm gonna explain my thinking to you to see if I'm missing something. I added the FireLabels to my dataset, the fires are labeled from the names of the 1020 videos, which gives 420 different fires, which is what we find in the warning:

wildfire.metadata['fire_id'].max()

But since we didn't annotate all the dataset the number of fireid actually present in the annotations is 340. wildfire.metadata['fire_id'].nunique()
What raised the warning:

Distinct values of ids(340 ≠ 420)

Am I using the fireLabeler in the wrong way? Should I have deleted the unlabeled videos from my video list?

Thank you for detailing your workflow, I understand now more clearly what you're trying to achieve.

I recommend to only work with the data available. Checks should pass and it would be more convenient for you to work. Here is a suggestion.

Let's see the flow from the videos to the WildFireDataset

At beginning, we can have:

all videos in a directory

checkedstates covering only annotated data

metadata covering all videos <-- this needs to be changed to cover only annotated data

So if you, create a new metadata.csv from the original(exhaustive) one, you tell the other objects (WildFireDataset, FireLabeler, WildFiresplitter...) the videos you are going to use.
For instance, using a merge between checkstates and original metadata.

The role of metadata is to provide information of the data (videos/frames) you need to process. The other objects will process data base on these information, so that's important metadata only contain what's needed/available.
Concerning the videos/frames, you shouldn't need to subselect what's needed/delete unannotated videos because the objects(splitter...) will be able to filter based on metadata.csv.

Of course, we can improve this process. It is very valuable to have your feedback, tell me if this helps.
Thanks,

Okay, I'll work on my workflow. I'll put the warning back on. Thanks

pyronear/datasets/wildfire/split_strategy.py

frgfm

Thanks for the PR! Looks good to me!

* Allow test ratio to zero in WildFireSplitter * fix flake8 * add training script for Wildfire * add a test to improve code coverage * put back the warning on FireIds * Fix size in RandomCrop and delete CenterCrop * add centercrop to get square images * remove change already made in PR #63

* Allow test ratio to zero in WildFireSplitter * fix flake8 * add a test to improve code coverage * fix test * put back the warning on FireIds * fix style

* Allow test ratio to zero in WildFireSplitter * fix flake8 * add training script for Wildfire * add a test to improve code coverage * put back the warning on FireIds * Fix size in RandomCrop and delete CenterCrop * add centercrop to get square images * remove change already made in PR pyronear#63

* Allow test ratio to zero in WildFireSplitter * fix flake8 * add a test to improve code coverage * fix test * Add SubSampler DataSet * add tests for SubSampler dataset * add SSresnet18 model * clean code in subsampler dataset * Add test for SSResNet18 * Update SSresnet18 for a better combinaition of frames * use of .to instead of .cuda * remove useless if * remove change already made in PR #63 * remove change already made in PR #63 2 * remove test with test ratio equal zero, impossible without #63 * Create an other PR to add the ssresnet18 * Allow test ratio to zero in WildFireSplitter * fix flake8 * add a test to improve code coverage * fix test * Add SubSampler DataSet * add tests for SubSampler dataset * add SSresnet18 model * clean code in subsampler dataset * Add test for SSResNet18 * Update SSresnet18 for a better combinaition of frames * use of .to instead of .cuda * remove useless if * remove change already made in PR #63 * remove change already made in PR #63 2 * remove test with test ratio equal zero, impossible without #63 * Create an other PR to add the ssresnet18 * bring subsampler in wildfire * update branch * add license header * replace random per SystemRandom * fix license header * put back import unitest * put back import random for shuffle * define subSampler as a function * allow to get metadata from path * update tests * remove useless import * reproduce random error * put back SystemRandom * add seed for reproducibility * fix typo and class name * add seed to docstring * update main.yml

Allow test ratio to zero in WildFireSplitter

0821854

MateoLostanlen linked an issue May 1, 2020 that may be closed by this pull request

[DataSet] Allow test ratio to zero in WildFireSplitter #61

Closed

fix flake8

a59ebad

MateoLostanlen added 2 commits May 2, 2020 16:52

add a test to improve code coverage

9bb7bfa

fix test

d4c6b90

frgfm added type: enhancement New feature or request module: datasets Related to datasets labels May 12, 2020

frgfm added this to the 0.1.1 milestone May 12, 2020

MateoLostanlen requested review from x0s and frgfm May 12, 2020 09:04

frgfm reviewed May 12, 2020

View reviewed changes

frgfm mentioned this pull request May 12, 2020

Train WildFire #64

Merged

MateoLostanlen mentioned this pull request May 17, 2020

[DataSet] Allow test ratio to zero in WildFireSplitter #61

Closed

frgfm mentioned this pull request May 19, 2020

Add SubSsampler dataset #66

Merged

MateoLostanlen added 2 commits May 19, 2020 18:46

put back the warning on FireIds

5c13add

fix style

a92e136

frgfm approved these changes May 19, 2020

View reviewed changes

MateoLostanlen merged commit 065286d into pyronear:master May 19, 2020

MateoLostanlen added a commit to MateoLostanlen/pyro-vision that referenced this pull request May 22, 2020

remove change already made in PR pyronear#63

6ae1df7

MateoLostanlen added a commit to MateoLostanlen/pyro-vision that referenced this pull request May 22, 2020

remove change already made in PR pyronear#63

c67ee3f

MateoLostanlen added a commit to MateoLostanlen/pyro-vision that referenced this pull request May 22, 2020

remove change already made in PR pyronear#63 2

2cc7da2

MateoLostanlen added a commit to MateoLostanlen/pyro-vision that referenced this pull request May 22, 2020

remove test with test ratio equal zero, impossible without pyronear#63

8072b95

MateoLostanlen added a commit to MateoLostanlen/pyro-vision that referenced this pull request Jun 28, 2020

remove change already made in PR pyronear#63

dcaea74

MateoLostanlen added a commit to MateoLostanlen/pyro-vision that referenced this pull request Jun 28, 2020

remove change already made in PR pyronear#63 2

710a2bf

MateoLostanlen added a commit to MateoLostanlen/pyro-vision that referenced this pull request Jun 28, 2020

remove test with test ratio equal zero, impossible without pyronear#63

1f353f0

MateoLostanlen deleted the allowTestRatioToZero branch September 18, 2020 17:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow test ratio to zero in WildFireSplitter #63

Allow test ratio to zero in WildFireSplitter #63

MateoLostanlen commented May 1, 2020

codecov bot commented May 2, 2020 •

edited

Loading

frgfm commented May 12, 2020

MateoLostanlen commented May 12, 2020

frgfm left a comment •

edited

Loading

frgfm May 12, 2020

x0s May 14, 2020

MateoLostanlen May 17, 2020

x0s May 18, 2020 •

edited

Loading

MateoLostanlen May 18, 2020

x0s May 18, 2020 •

edited

Loading

MateoLostanlen May 19, 2020

frgfm left a comment

Allow test ratio to zero in WildFireSplitter #63

Allow test ratio to zero in WildFireSplitter #63

Conversation

MateoLostanlen commented May 1, 2020

codecov bot commented May 2, 2020 • edited Loading

Codecov Report

frgfm commented May 12, 2020

MateoLostanlen commented May 12, 2020

frgfm left a comment • edited Loading

Choose a reason for hiding this comment

frgfm May 12, 2020

Choose a reason for hiding this comment

x0s May 14, 2020

Choose a reason for hiding this comment

MateoLostanlen May 17, 2020

Choose a reason for hiding this comment

x0s May 18, 2020 • edited Loading

Choose a reason for hiding this comment

MateoLostanlen May 18, 2020

Choose a reason for hiding this comment

x0s May 18, 2020 • edited Loading

Choose a reason for hiding this comment

MateoLostanlen May 19, 2020

Choose a reason for hiding this comment

frgfm left a comment

Choose a reason for hiding this comment

codecov bot commented May 2, 2020 •

edited

Loading

frgfm left a comment •

edited

Loading

x0s May 18, 2020 •

edited

Loading

x0s May 18, 2020 •

edited

Loading