-
-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow test ratio to zero in WildFireSplitter #63
Allow test ratio to zero in WildFireSplitter #63
Conversation
Codecov Report
@@ Coverage Diff @@
## master #63 +/- ##
==========================================
+ Coverage 83.85% 83.98% +0.13%
==========================================
Files 19 19
Lines 706 712 +6
==========================================
+ Hits 592 598 +6
Misses 114 114
|
Hi @MateoLostanlen, Thanks for the PR! Is this ready for review? |
Yes it's ready :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot! Would you mind providing us in #61 with the exact commands you used to produce the error this PR is addressing please?
The commented section might removed, and I wrote a small question about the test precision. Let me know what you think!
# if wildfire.metadata['fire_id'].nunique() != wildfire.metadata['fire_id'].max() + 1: | ||
# warnings.warn(f"Inconsistent Fire Labeling. Maybe try to label the fire again\n" | ||
# f"Distinct values of ids({wildfire.metadata['fire_id'].nunique()}" | ||
# f" ≠ {wildfire.metadata['fire_id'].max() + 1})", Warning) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If commented, do we need to keep this section at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This warning is here for some reason. Splitting this dataset suppose firelabeling was correctly made.
If it wasn't, a warning is raised to warn the user. It could be difficult to see it from outside the function.
Besides, If you want to silence warnings when using this code in your script/notebook in case too much information displayed bother you, you can do it this way: (no need to silence warnings for everyone)
import warnings
warnings.filterwarnings("ignore")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For me a warning that is raised under any circumstances doesn't really make sense. We can eventually put it back on if we annotate the entire dataset. We can also modify it by considering what was actually annotated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, It does raise a warning under any circumstances corresponding to the same misuse of metadata.
If you instantiate a WildFireDataset
object with a metadata file, it should cover exactly the values of this dataset, not more not less. Here a warning is raised because such inconsistency was detected.It is important, because this a a responsibility for the splitter not to mess the fire ids.
Also, the count of frames in the directory used to init WildFireDataset does not prevail on the metadata to give information on the dataset (actually, this is the role of metadata to do so).
This is even more dangerous because using a metadata not reflecting the data, give wrong length of WildFireDataset ! so we can expect that some splitting strategies may process the dataset in unexpected ways !
Following this conversation, I think precising this in the documentation and testing metadata consistency over the data it covered in WildFireDataset may help us.
So, If you work on a subset of Wildfire today, it may be better working with a subset of the metadata.csv, and it should work well. At some point you should have made this distinction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I'm sorry, I think I don't understand something. I'm gonna explain my thinking to you to see if I'm missing something. I added the FireLabels to my dataset, the fires are labeled from the names of the 1020 videos, which gives 420 different fires, which is what we find in the warning:
wildfire.metadata['fire_id'].max()
But since we didn't annotate all the dataset the number of fireid actually present in the annotations is 340. wildfire.metadata['fire_id'].nunique()
What raised the warning:
Distinct values of ids(340 ≠ 420)
Am I using the fireLabeler in the wrong way? Should I have deleted the unlabeled videos from my video list?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for detailing your workflow, I understand now more clearly what you're trying to achieve.
I recommend to only work with the data available. Checks should pass and it would be more convenient for you to work. Here is a suggestion.
Let's see the flow from the videos to the WildFireDataset
At beginning, we can have:
- all videos in a directory
- checkedstates covering only annotated data
- metadata covering all videos <-- this needs to be changed to cover only annotated data
So if you, create a new metadata.csv from the original(exhaustive) one, you tell the other objects (WildFireDataset, FireLabeler, WildFiresplitter...) the videos you are going to use.
For instance, using a merge between checkstates and original metadata.
The role of metadata is to provide information of the data (videos/frames) you need to process. The other objects will process data base on these information, so that's important metadata only contain what's needed/available.
Concerning the videos/frames, you shouldn't need to subselect what's needed/delete unannotated videos because the objects(splitter...) will be able to filter based on metadata.csv.
Of course, we can improve this process. It is very valuable to have your feedback, tell me if this helps.
Thanks,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I'll work on my workflow. I'll put the warning back on. Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! Looks good to me!
* Allow test ratio to zero in WildFireSplitter * fix flake8 * add training script for Wildfire * add a test to improve code coverage * put back the warning on FireIds * Fix size in RandomCrop and delete CenterCrop * add centercrop to get square images * remove change already made in PR #63
* Allow test ratio to zero in WildFireSplitter * fix flake8 * add a test to improve code coverage * fix test * put back the warning on FireIds * fix style
* Allow test ratio to zero in WildFireSplitter * fix flake8 * add training script for Wildfire * add a test to improve code coverage * put back the warning on FireIds * Fix size in RandomCrop and delete CenterCrop * add centercrop to get square images * remove change already made in PR pyronear#63
* Allow test ratio to zero in WildFireSplitter * fix flake8 * add a test to improve code coverage * fix test * Add SubSampler DataSet * add tests for SubSampler dataset * add SSresnet18 model * clean code in subsampler dataset * Add test for SSResNet18 * Update SSresnet18 for a better combinaition of frames * use of .to instead of .cuda * remove useless if * remove change already made in PR #63 * remove change already made in PR #63 2 * remove test with test ratio equal zero, impossible without #63 * Create an other PR to add the ssresnet18 * Allow test ratio to zero in WildFireSplitter * fix flake8 * add a test to improve code coverage * fix test * Add SubSampler DataSet * add tests for SubSampler dataset * add SSresnet18 model * clean code in subsampler dataset * Add test for SSResNet18 * Update SSresnet18 for a better combinaition of frames * use of .to instead of .cuda * remove useless if * remove change already made in PR #63 * remove change already made in PR #63 2 * remove test with test ratio equal zero, impossible without #63 * Create an other PR to add the ssresnet18 * bring subsampler in wildfire * update branch * add license header * replace random per SystemRandom * fix license header * put back import unitest * put back import random for shuffle * define subSampler as a function * allow to get metadata from path * update tests * remove useless import * reproduce random error * put back SystemRandom * add seed for reproducibility * fix typo and class name * add seed to docstring * update main.yml
This PR aims to answer the issue : [DataSet] Allow test ratio to zero in WildFireSplitter #61
As explain in the issue :
At the moment the input ratios of WildFireSplitter cannot be equal to zero. I would like to add this option. Indeed we are going to do a lot of tests with different extraction methods (number of frames, spacing between frames ...). As the frame extraction is long I think we will save a lot of time if we don't have to do it every time for the test set.
PS: I have also commented the warinig
We haven't annotated the entire Dataset so we're going to get this warning every time.