-
Notifications
You must be signed in to change notification settings - Fork 7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Evaluate models accuracy when using inference transforms on Tensors (instead of PIL images) #6506
Comments
@pmeier that's an important point that we need to include in the #6433 PR |
Some partial results below using #6508. All of these are on the
Note: PIL was used for jpeg decoding in all cases - we're only interested about variance w.r.t. to the transforms here, not about the decoding (although that might still be worth considering).
|
@NicolasHug thanks for publishing the results this looks very interesting. It's obvious that antialiasing will have a massive effect to users when they start using more the Tensor backend. @pmeier @vfdev-5 These are the numbers I quoted on our offline chat today. It's worth noting that if we wanted to ensure that the user has control over the antialiasing option on Transforms, we would have to expose this option to every Transform that uses resize behind the scenes. |
Re PIL images vs tensors, I was experiencing the issue with antialias set to "warn" in the older torchvision versions. After looking at the forward method of the Image Classification class in transforms _presets.py, I decided to test each step. Using the I wonder if the real difference is due to using different data types (uint8 vs float32) and/or dynamic ranges. I think it's probably the different data types, but a more thorough investigation would be useful. |
Hi @CA4GitHub , It's not about dtype,, it's really about antialiasing. In fact even if you passed a (that's not the case anymore as of... yesterday, with |
The accuracy of our trained models that we currently report come from evaluations run on PIL images. However, our inference-time transforms also support Tensors.
In the wild, our users might be passing Tensors to the pre-trained models (instead of PIL images), so it's worth figuring out whether the accuracy is consistent between Tensors and PIL.
Note: we do check that all the transforms are consistent between PIL and Tensors, so hopefully differences should be minimal. But models are known to learn interpolation tweaks and in particular the use of anti-aliasing. PIL uses anti-aliasing by default and this is what our models where trained on, but we don't pass
antialias=True
to the Resize transform, so it might be a source of discrepancy.As discussed internally with @datumbox, figuring that out is part of the transforms rework plan (although it's relevant outside of the rework as well).
cc @vfdev-5 @datumbox
The text was updated successfully, but these errors were encountered: