-
Notifications
You must be signed in to change notification settings - Fork 7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding fvgc_aircraft dataset #5178
Conversation
💊 CI failures summary and remediationsAs of commit caf762b (more details on the Dr. CI page): 💚 💚 Looks good so far! There are no failures yet. 💚 💚 This comment was automatically generated by Dr. CI (expand for details).Please report bugs/suggestions to the (internal) Dr. CI Users group. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @sallysyw, thanks a lot for the PR. I've one question before I actually dive into the PR: why do we want restructure the data into an image folder hierarchy? IIUC, we could also work with the given format, which I would prefer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for the PR Yiwen!
This looks great, I made a few minor comments below. As Philip noted above, we tend not to change the dataset underlying structure, partly so that users can also just manually download the files, should they want to. Was there a specific reason to re-organise the dataset structure?
Thanks!
Good point! I did that for no specific reason - just follow what VISSL did. But I agree that we'd better not copy these images to another folder after downloading to avoid additional overhead. Let me send a patch to address this issue. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @sallysyw, I have some minor comments inline.
Apart from that I'm wondering if it makes sense to add a parameter to select the different "levels" of the dataset.
- Model, e.g. Boeing 737-76J. Since certain models are nearly visually indistinguishable, this level is not used in the evaluation.
- Variant, e.g. Boeing 737-700. A variant collapses all the models that are visually indistinguishable into one class. The dataset comprises 102 different variants.
- Family, e.g. Boeing 737. The dataset comprises 70 different families.
- Manufacturer, e.g. Boeing. The dataset comprises 41 different manufacturers.
We could have a level="variant"
that also allows "family"
and "manufacturer"
. From what I see, we would only need to change the files we read from based on this parameter.
That's a great point. Ultimately our goal with this dataset is to support our colleagues which are implementing FLAVA, and the "reference" in this case is https://github.com/facebookresearch/vissl/blob/main/extra_scripts/datasets/create_fgvc_aircraft_data_files.py. From what I understand, they would only need to use the "variant" categories. Perhaps it would be OK to just implement the "variant" category in this PR to keep it simple, and to implement the rest if users request more categories? |
Thanks both of you for the review. For the categories, I think for completeness I will provide a parameter called "level" or "annotation_level" and set the default to be "variant", this way it covers the existing functionalities in FLAVA and still giving users flexibility to use other levels' annotations of this dataset. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small nits and one question inline. Otherwise LGTM. Thanks a lot @sallysyw!
Summary: * add fvgc_aircraft dataset * add docstring & remove useless import * resolve lint issue * address comments * adding more annotation level * nit * address comments * Apply suggestions from code review * unify format * remove useless line Reviewed By: NicolasHug Differential Revision: D33618172 fbshipit-source-id: ce6471096d8527b08373061e8ec2059a25f96f1d Co-authored-by: Philip Meier <github.pmeier@posteo.de>
Per #5108
cc @pmeier