-
Notifications
You must be signed in to change notification settings - Fork 330
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Resolve mean/std swap for VITDet backbone #2087
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ianstenbit for the quick fix! Quoting from #2086 (comment):
Here's the demo I usually run to test that the model is close to the original model at facebookresearch/segment_anything.
Just one non-blocking suggestion: What do you think about keeping the normalization step an opt-in rather than always having it on? This way the users can choose the preprocessing they want on their images.
@tirthasheshpatel what is the purpose of making this optional? You mean the mean/stddev bit I assume? |
Yes, I was referring to the mean/stddev part. The reason for doing that is to give users more control over input preprocessing (which might be useful for training). But I am not against changing the 0-255 input precedent (having that is great!). Just something like an |
I'm not sure what the value of this would be -- is there a reason that a user would care how the model internally does rescaling, as long as the API is clear about what is expected from the caller? |
/gcbrun |
I just thought there would be cases where the ImageNet rescaling would mess a bit with padded inputs (making borders non-zero). Although there is a way to avoid that (using mean as the padding value), just that some users might find it unintuitive. Thinking in the wild here :) Please feel free to make a decision based on your best judgment! |
Ahh I see your point here -- I think in this case it's okay. Zero padding will still end up being represented as the same pixel values as all-black pixels, which is logically the same as if no rescaling were done, it's just that the values aren't literally 0. |
Ah OK, makes sense. Thanks for clarifying! Feel free to merge! |
@tirthasheshpatel does your SAM demo perform as-expected with this change?