Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor Portilla-Simoncelli model #225

Merged
merged 119 commits into from
Feb 29, 2024
Merged

Refactor Portilla-Simoncelli model #225

merged 119 commits into from
Feb 29, 2024

Conversation

billbrod
Copy link
Collaborator

@billbrod billbrod commented Nov 15, 2023

This pull request completely refactors the Portilla-Simoncelli texture model. As part of this, following changes to po.simul.PortillaSimoncelli:

  • The model now can accept images with arbitrary number of batches and channels
  • The model is faster (about 2x faster on both GPU and CPU)
  • No longer need to call model.to(torch.float64) in order to accept double-precision inputs -- will do so automatically.
  • Adds type-hinting
  • Checks image shape and raises informative error message if we can't handle it (see Make Portilla-Simoncelli texture model work on arbitrarily-sized images #221)
  • Really really tried to make the code much more modular and legible -- I hope it's much clearer what's happening and why in forward
  • Building off of Add PortillaSimoncelliMinimal to tutorial get original paper's statistics #216 , model now only returns necessary statistics, throwing away all redundant ones.
  • Removes support for use_true_correlations=False. We now only support using the true correlations, because the only reason someone would ever set it to False was to check against matlab. I still test against the matlab, but this now requires a bit more work, which now all lives in tests.
  • Related to above, correctly normalize the cross-correlations so that they're actual correlations (previous vesion wasn't quite right).
  • PS now uses helper functions when relevant instead of implementing own version.

Other changes:

  • Several helper functions have been moved from PS code to tools/ or added: center_crop (no longer requires torchvision), expand (upsample an image using Fourier transform), shrink (downsample an image using Fourier transform), modulate_phase (modulate the phase of a complex signal, e.g., double the phase of a steerable pyramid coefficient for correlating it with another scale), autocorrelation (replaces and slightly generalizes existing autocorr function).
  • Adds many more tests.
  • Adds section in tips about making sure statistics are all in same range, as that's helpful.
  • Relevant parts of notebooks have been updated and rerun, PS notebook has been completely rerun to ensure output is qualitatively similar (it is)

Still to do:

Notes:

  • at one point, I considered switching away from using the downsampled pyramid, so that the coefficients at all scales have the same shape. This would make the code cleaner (currently, multiscale representations are lists of tensors, rather than a single tensor), but ends up making things much less inefficient and changes the output such that we can no longer guarantee reproduction of the matlab values. So I think this is not worth doing.
  • I think the efficiency of the model can be further improved, but not quite sure how. I feel like pytrees will help, but my initial attempts at using them did not. See Make Portilla-Simoncelli code more efficient #222 for notes.
  • While the code now works on multi-channel images, you cannot generate color metamers out-of-the-box. See Add color/channel support #46 for discussion.
  • This will make it much easier to support the pooled texture model used in Freeman's Metamers of the vetnral stream, among other places.

Closes: #199, #142

and corrects some docstrings
move that out of the bowels of PortillaSimoncelli, because it might be
helpful
beause it's getting automatically added by something
This refactors PS to:

 - remove all unnecessary attributes (representation, etc)

 - make forward() much more straightforward, calling transparently-named
   methods that return the thing they say they do
 - removes old autocorr, nothing uses it

 - adds new autocorrelation, the way needed for portilla simoncelli

 - puts expand and shrink in signal.py
this makes it possible to vmap it
refactor to use the non-downsampled pyramid. probably won't stick with
this, because it's ~2x slower on the cpu, and fairly different
values (rtol=1e-1, atol=1e-4 ish)
it's actually more efficient, as long as we can't use vmap (which we
can't)
third version of this, but I think this is the way. still use the
downsampling pyramid, but now make lists of tensors (one per scale) to
use. gets us an intermediate version between the two, while stlil
passing all the tests
Copy link
Contributor

@BalzaniEdoardo BalzaniEdoardo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the changes. Looks good to me

Returns
-------
representation_vector:
3d tensor of shape (B,C,S) containing the measured texture
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it more now

Comment on lines 18 to 20
SCALES_TYPE = Union[
int, Literal["pixel_statistics", "residual_lowpass", "residual_highpass"]
]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it now, it is consistent!

reconstructed_images.append(recon + reconstructed_images[-1])
# now downsample as necessary, so that these end up the same size as
# their corresponding coefficients.
reconstructed_images[:-1] = [signal.shrink(r, 2**(self.n_scales-i)) * 4**(self.n_scales-i)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think either choices work, as long as you explain what is happening. If you choose to keep the pixel value the same I would add a comment before this line (when you multiplying by 4**(n_scales -1) ) and a note in the shrink function saying, "this function maintains the pixel values fixed; the power however will change. If you need to preserve the power, multiply by...".

If you chose the for having the power as an invariant, I would also make it clear in the docstrings of the shrink/expand: something like, "this function down-samples the image and scales it to preserve the power..."


def plot_representation(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, I agree with the separate PR

0, (2 * self.n_orientations, max(2 * self.n_orientations, 5), self.n_scales)
)
n_filled += nn
def convert_to_dict(self, representation_vector: Tensor) -> OrderedDict:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, that makes sense to me; If there is no way to make it general, I agree it's not worth the effort

@@ -1,55 +1,58 @@
import torch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that works

@billbrod
Copy link
Collaborator Author

This is ready to go now, after I push the merge with main. I changed the description of magnitude means at the very end.

@billbrod billbrod merged commit 136527d into main Feb 29, 2024
14 checks passed
@billbrod billbrod deleted the ps_refactor branch February 29, 2024 15:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add multi-batch and channel support for PortillaSimoncelli
4 participants