-
Notifications
You must be signed in to change notification settings - Fork 378
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update FAIR1M dataset and datamodule #1275
Conversation
1d0d25b
to
e74ecad
Compare
…nto datasets/fair1mv2
56ffb9c
to
6cdd0a0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@adamjstewart can comment on the versionadded stuff, else LGTM
Yes, it does need versionadded |
6cdd0a0
to
2ef0cad
Compare
don't match | ||
|
||
.. versionchanged:: 0.5 | ||
Added *split* and *download* parameters. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The split and download parameters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like more of a nitpick. Added
is more clear. The
is just a statement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should also be versionadded, not versionchanged. The versionadded template already says "New in version 0.5:"
from .utils import dataset_split | ||
|
||
|
||
def collate_fn(batch: list[dict[str, Tensor]]) -> dict[str, Any]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is this different from unbind_samples?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because of the torch.stack on line 27
.. versionadded:: 0.5 | ||
""" | ||
output: dict[str, Any] = {} | ||
output["image"] = torch.stack([sample["image"] for sample in batch]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this line do anything?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was mostly copied from nasa_marine_debris.py
. I'm assuming it's there for mypy reasons.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But doesn't this just unstack and restack so that the output is identical?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, this takes the batch (a list of sample dicts) and grabs each image and stacks it into a single tensor along a new batch dimension.
os.path.join("validation", "images"), | ||
os.path.join("validation", "labelXml"), | ||
), | ||
"test": (os.path.join("test", "images")), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Working on a PR now to add type hints to a bunch of stuff. Just noticed this bug. Here test
is of type str
, not tuple[str]
. Will fix in my other PR.
Originally we created the FAIR1M dataset when only train/part1 images and labels were available.
This PR updates the FAIR1M dataset and datamodule to work with the latest train/val/test sets.