-
Notifications
You must be signed in to change notification settings - Fork 7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add transforms and presets for optical flow models #5026
Conversation
💊 CI failures summary and remediationsAs of commit f6fe16d (more details on the Dr. CI page):
🕵️ 1 new failure recognized by patternsThe following CI failures do not appear to be due to upstream breakages: cmake_windows_gpu (1/1)Step: "set -ex
|
return img1, img2, flow, valid_flow_mask | ||
|
||
|
||
class RandomErase(torch.nn.Module): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is similar to our existing RandomErase
, but replaces with the mean of the pixels, instead of replacing with a given (known-in-advance) value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand that this class is supposed to be used BEFORE converting the image from int to floats. This is unlike TorchVision's RandomErasing
class which is applied AFTER converting the image to float, scaling it to 0-1 and normalizing it. I would definitely prefer having in TorchVision following this approach rather than the current one and that's something worth changing on the future, but I think offering another transform right now that operates slightly differently could be confusing. Thoughts?
Concerning using mean VS a fixed value, note that TorchVision's RandomErasing implementation uses a fixed zero value because the images are expected to be de-meaned. I wonder if it would be possible to reuse for now TorchVision's transform if you passed a normalized image. Perhaps that's worth doing even if the outcome is not 100% the same, just to maintain parity between transforms defined in references and those in legacy transforms (this will simplify porting to the new API).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @NicolasHug, I've added a few comments for discussion. Might be worth discussing this offline and then summarize our decision to speed this up.
img1 = super().forward(img1) | ||
img2 = super().forward(img2) | ||
else: | ||
# symmetric: same transform for img1 and img2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@NicolasHug: So does the p
determine the probability of doing symmetric VS asymmetric? If yes I would add a comment to clarify.
@pmeier: Could you please check this strange transform to confirm it's supported by the new Transforms API?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As it stands, this would not be supported. A transform always treats a sample as atomic unit and so multiple images in the same sample would be transformed with the same parameters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I'll clarify.
Ultimately this is a special case of RandomApply(t1, t2, [p, p - 1])
, so there's nothing too fancy here
t2
can be a Sequential(take_care_of_img1_only, take_care_of_img2_only)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@NicolasHug Sounds good, just add comments. No need to use RandomApply here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pmeier No worries, this is why we give the option for someone to write custom transforms without the magic of the new API. For weird cases like this. Could you now confirm that this is indeed a workaround we can apply?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sequential(take_care_of_img1_only, take_care_of_img2_only)
I'm guessing take_care_of_img1_only
and take_care_of_img2_only
are transforms here, correct? If yes, how would you tell the transform to only handle one or the other image if both receive the full sample?
I think this is one of the cases @datumbox mentioned where we need to circumvent the automatic dispatch a little. In case we want to transform both samples separately, we could split the sample and and perform the transformation once for the sample minus image 2 and once for image2. The problem I see with this, is that it can't be automated without assumptions about how the sample is structured. So we either need to use the same structure for every dataset (for example flat dictionary with image1
and image2
keys) or provide a way to parametrize the transform.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm guessing take_care_of_img1_only and take_care_of_img2_only are transforms here, correct? If yes, how would you tell the transform to only handle one or the other image if both receive the full sample?
Each transform would receive the entire input (which IIRC is a dict) and operate on a subset of that dict.
Are you suggesting that img1 and img2 would be concatenated?
return img1, img2, flow, valid_flow_mask | ||
|
||
|
||
class RandomErase(torch.nn.Module): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand that this class is supposed to be used BEFORE converting the image from int to floats. This is unlike TorchVision's RandomErasing
class which is applied AFTER converting the image to float, scaling it to 0-1 and normalizing it. I would definitely prefer having in TorchVision following this approach rather than the current one and that's something worth changing on the future, but I think offering another transform right now that operates slightly differently could be confusing. Thoughts?
Concerning using mean VS a fixed value, note that TorchVision's RandomErasing implementation uses a fixed zero value because the images are expected to be de-meaned. I wonder if it would be possible to reuse for now TorchVision's transform if you passed a normalized image. Perhaps that's worth doing even if the outcome is not 100% the same, just to maintain parity between transforms defined in references and those in legacy transforms (this will simplify porting to the new API).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM thanks @NicolasHug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR!
I've only a couple of minor comments, otherwise LGTM!
Reviewed By: NicolasHug Differential Revision: D32950925 fbshipit-source-id: 77cce3f89a1110fa607b219ffa1cdf639a424a33
Towards #4644
This PR adds transforms and train/eval presets needed for the optical flow training reference. Ideally #5004 should be merged first.
A lot of these transforms are just wrappers around existing tranforms, modified so that they accept all four parameters
img1, img2, flow, valid_flow_mask
. For the rest of the transforms I added comments in the code.CC @fmassa @datumbox @haooooooqi