Updates save/load behavior for breaking change in torch 2.6 #317

billbrod · 2025-02-27T20:55:31Z

As discussed in the related issue (#313), Torch 2.6 introduced a breaking change in how load works, by setting weights_only=True by default and being much more conservative. This PR makes plenoptic compatible with that change.

Our previous behavior was incompatible with the changes because we were saving two types of objects:

python functions (metamer's loss function, mad competition's metrics)
pytorch optimization objects (optimizers and schedulers)

We also save a bunch of python primitives (floats, ints, strings) and tensors, but these are fine.

Now, we have modified save() so that object attributes are placed into three categories:

save_attrs: primitives and tensors that can be saved directly. These are not expicitly set, but they're all attributes that aren't included in the next two categories.
save_io_attrs: functions or callable objects that accept and return tensors (e.g., loss functions, metrics, models). these are state-less, in that nothing changes over the course of synthesis. For these objects, we save a tuple with their name (using _get_name in synthesis.py), the name of one or more other attributes of the object that can be passed as inputs (e.g., _image, _metamer) and the output when called on those objects.
save_state_dict_attrs: pytorch objects with a state_dict that may change over the course of synthesis (e.g., optimizers whose learning rate may change). We save a tuple with their name and their state_dict.

On load, we:

(as before) check all attributes set at initialization match those we are trying to load (e.g., range_penalty_lambda, image).
check all "io attributes" from save, ensuring that their names and input/output behavior are the same.
check all the "save_dict attributes" have the same name and then load their state_dict (overwriting existing state). Currently, these attributes only include schedulers and optimizers, which are not set at initialization. We cache their name and state_dict, and check when they are initialized (the first call to synthesize).

For all synthesis objects other than Eigendistortion (for which it's not relevant), we also provide the save_objects and weights_only boolean flags for save and load, respectively. If save_objects=True, we save the callables (loss functions, metrics, schedulers, optimizers, but NOT models, since torch.nn.Module can be very large). In this case, they must then pass weights_only=False to load, to override the new default torch behavior. EDIT: removed these flags, see later comment.

This makes our save/load behavior not backwards compatible because of how we handle the model. Previously, I was implicitly checking it (by checking for e.g., Metamer.target_representation), but now I treat it the same as the loss function. However, this current behavior works with (at least) torch 2.5 and 2.6, so that's good.

Additionally:

we add some checks for scheduler to match the checks made by optimizer: first time synthesize is called, scheduler can be non-None, every subsequent time, must be None.
geodesic update: in our docs, we say that plenoptic works with models that output 3d or 4d tensors, but the old way of doing this meant that geodesics only worked with 4d outputs. adds a small fix for that (and tests)
I also removed the ruff actions from ci.yml, because they're being handled by the pre-commit action (and the versions were out of sync, which meant the ci.yml version was failing, preventing me from merging this PR)
Had to pin sphinx<8.2 due to a Cannot run with Sphinx 8.2.0 spatialaudio/nbsphinx#825. I will be dropping nbsphinx when I switch over to myst/mystnb, so this is temporary.

I still need to:

Add tests for new behavior.
Add a user-facing docs page with some details about save/load. -- after removing the weights_only/save_objects flags, I don't think this necessary.

Questions:

Right now, the error message when two tensors are different is not helpful. So, if I create a metamer object with a different target image, I raise an error, but I print out the tensors and their difference, which is ... pretty hard to parse. But feel like that makes more sense than just not printing anything out? I provide more informative info when possible (e.g., the tensors shapes are different).
The check that names are the same may fail because someone has restructured their code or updated something (because I include the module in the name). But I think that's fine? In general, it's hard to guarantee things can be saved/loaded across versions (see e.g., sklearn's advice).
- I was originally going to only raise a warning if the name was different and then e.g., have the optimizer load the state dict and try to continue. But pytorch let's you load another optimizer's state dict. So I can do SGD.load_state_dict(Adam.state_dict()) without a problem, but then when I try to step the optimizer, I get a confusing error message (because it can't find the relevant information). To avoid this problem, I just decided that names have to be stable.

(relevant torch docs: here, and here)

closes #313

This reverts commit c5180a3.

This reverts commit dd8c3be.

don't create new tensors to check against, use existing ones

because it's handled by the pre-commit action

billbrod · 2025-02-27T21:23:22Z

I also removed the ruff actions from ci.yml, because they're being handled by the pre-commit action (and the versions were out of sync, which meant the ci.yml version was failing, preventing me from merging this PR)

sjvenditto · 2025-02-27T21:35:16Z

src/plenoptic/synthesize/metamer.py


        """
-        super().save(file_path, attrs=None)
+        if not save_objects:


why is this if not save_objects? shouldn't including the loss function in save_io_attrs occur when save_objects is True? or am I misunderstanding something? (this is the same for the other synthesis objects)

if save_objects is True, we save the loss function along with all the other attributes (that is, we pickle the whole function). If it's fFalse, then we don't save the function but it's behavior, which is what inclusion in save_io_attrs means (by behavior I mean: given this input, return this output).

regardless, we never save the whole model object because it can get really big

got it, so I was misunderstanding save_io_attrs, which is the "new" way of saving things about the objects, thus introducing the break in backwards compatibility?

Yes, save_io_attrs is one of two new ways of saving info about the objects (the other being save_state_dict_attrs). The breaking of backwards compatibility is because I now explicitly include model in save_io_attrs to check its behavior, whereas previously I would only implicitly check its behavior, by checking whether Metamer._target_representation was identical, which is the cached output of the model on the target image.

What I'm doing now for save_io_attrs (and check_io_attributes in load) is similar to what I was doing before for check_loss_functions in load; the reasoning is the same, but the specifics are different.

sjvenditto · 2025-02-27T21:44:54Z

src/plenoptic/synthesize/geodesic.py

@@ -433,13 +433,26 @@ def save(self, file_path: str):
        ----------
        file_path : str
            The path to save the Geodesic object to
+        save_objects :
+            If True, we use pickle to save all non-model objects (optimizer). To load


what saves when save_objects=False? would it be useful to have this information in the docstring?

My inclination is no -- the user shouldn't worry about what's saved. In either case, calling load appropriately will give you back the synthesis object at the same state, so it doesn't matter which way you got there. The difference is that save_objects is potentially unsafe. This is what I'm going to try and explain in the doc page I need to write.

This docstring should probably be updated then. Let me write the doc page first, and then see if that helps me find a clearer wording here.

billbrod · 2025-02-28T21:01:34Z

After some conversations and thinking, I've removed the weights_only and save_objects flags. They only saved a minimum amount of code in an advanced use case (not using the default optimizer), and so are not worth the extra effort.

without flags:

img = po.data.einstein()
model = po.simul.Gaussian(30)
po.tools.remove_grad(model)
met = po.synth.Metamer(img, model)
optimizer = torch.optim.SGD([met.metamer])
met.synthesize(5, optimizer=optimizer)
met.save('test.pt')
met_copy = po.synth.Metamer(img, model)
optimizer = torch.optim.SGD([met.metamer])
met_copy.load('test.pt')
met.synthesize(5, optimizer=optimizer)

with flags:

img = po.data.einstein()
model = po.simul.Gaussian(30)
po.tools.remove_grad(model)
met = po.synth.Metamer(img, model)
optimizer = torch.optim.SGD([met.metamer])
met.synthesize(5, optimizer=optimizer)
met.save('test.pt', save_objects=True)
met_copy = po.synth.Metamer(img, model)
met_copy.load('test.pt', weights_only=False)
met.synthesize(5)

The normal use case (using the default Adam optimizer) looks like:

img = po.data.einstein()
model = po.simul.Gaussian(30)
po.tools.remove_grad(model)
met = po.synth.Metamer(img, model)
met.synthesize(5)
met.save('test.pt')
met_copy = po.synth.Metamer(img, model)
met_copy.load('test.pt')
met.synthesize(5)

so in the standard case, the flags don't save the user any effort

billbrod · 2025-02-28T21:25:07Z

Had to pin sphinx<8.2 due to a bug with nbsphinx. I will be dropping nbsphinx when I switch over to myst/mystnb, so this is temporary.

billbrod · 2025-02-28T21:28:26Z

Documentation built by flatiron-jenkins at http://docs.plenoptic.org/docs//pulls/317

codecov · 2025-02-28T21:32:49Z

Codecov Report

Attention: Patch coverage is 93.78238% with 12 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/plenoptic/synthesize/metamer.py	73.33%	4 Missing ⚠️
src/plenoptic/synthesize/synthesis.py	96.19%	4 Missing ⚠️
src/plenoptic/tools/data.py	85.71%	2 Missing ⚠️
src/plenoptic/synthesize/geodesic.py	91.66%	1 Missing ⚠️
src/plenoptic/synthesize/mad_competition.py	90.00%	1 Missing ⚠️

Files with missing lines	Coverage Δ
src/plenoptic/synthesize/eigendistortion.py	`98.90% <100.00%> (+<0.01%)`	⬆️
src/plenoptic/tools/__init__.py	`100.00% <100.00%> (ø)`
src/plenoptic/tools/io.py	`100.00% <100.00%> (ø)`
src/plenoptic/synthesize/geodesic.py	`95.26% <91.66%> (-2.92%)`	⬇️
src/plenoptic/synthesize/mad_competition.py	`93.46% <90.00%> (+0.56%)`	⬆️
src/plenoptic/tools/data.py	`78.91% <85.71%> (+0.71%)`	⬆️
src/plenoptic/synthesize/metamer.py	`92.09% <73.33%> (-0.35%)`	⬇️
src/plenoptic/synthesize/synthesis.py	`93.45% <96.19%> (+3.24%)`	⬆️

🚀 New features to boost your workflow:

❄ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

more useful info

this is now handled earlier

private function to check whether tensors have same device, size, dtype and values ( in that order )

also makes sure geodesic supports models with 3d outputs

billbrod · 2025-03-06T16:22:39Z

Okay, I've finally added all the needed tests. My remaining question is: do I need to update my load docstrings? If so, what should I say? I don't want to expose the details of what's happening for people, instead I tried to make the error messages informative so they point what to do. I currently don't say that loading should be done with the same pytorch and plenoptic versions as saving, but that's general good practice -- does it belong there?

I am going to use sphinx's versionchanged directive to note that the behavior has changed

we had some old bits of code lying around, which were supporting old behavior: - removed code that would allow for the loading of code objects - removed code that would allow for state_dict attributes set at initialization. that's possible, but currently not done.

tests were improperly formatted, was apparently using curie for einstein image

BalzaniEdoardo

Hey Billy,
I have reviewed the machinery of the save/load but I did not focus on the documentation too much not at the tests. I think we can first do pass, on this comment and then I'll take a look at tests as well

src/plenoptic/synthesize/geodesic.py

src/plenoptic/synthesize/synthesis.py

src/plenoptic/synthesize/metamer.py

src/plenoptic/synthesize/synthesis.py

src/plenoptic/synthesize/metamer.py

src/plenoptic/synthesize/synthesis.py

there's something with Adam

since we're requiring dtype to be the same, shouldn't be an issue

BalzaniEdoardo

Once implemented the changes discussed, that's approved for me!

sjvenditto

I took a closer look at the tests, and in general they all look good. I just had a couple of comments/questions:

there are a few instances where some parameters aren't being explicitly tested (usually _allowed_range) for a mismatch. however, this parametrize was not added in this PR, so maybe there's a reason they're not being tested?
why is the geodesic optimizer being initialized with a private variable? I've also noticed the warning about Geodesic not being robust enough, so it might not be used very often anyway, so maybe this isn't a general issue?

I'd appreciate the clarification as I continue to learn the package! Depending on your responses, they might not lead to any changes -- in which case I'd approve the PR

sjvenditto · 2025-03-10T16:06:51Z

tests/test_geodesic.py

-    @pytest.mark.parametrize("model", ["frontend.OnOff.nograd"], indirect=True)
+    @pytest.mark.parametrize(
+        "model", ["frontend.LinearNonlinear.nograd"], indirect=True
+    )
    @pytest.mark.parametrize(
        "fail",
        [False, "img_a", "img_b", "model", "n_steps", "init", "range_penalty"],


There are additional attributes in Geodesic load's check_attributes that aren't being tested here, specifically _allowed_range and pixelfade. Should these be tested as well?

This is also the case for test_save_load for MAD and Metamer, both have _allowed_range in check_attributes, but aren't explicitly being tested for a mismatch

sjvenditto · 2025-03-10T16:10:41Z

tests/test_geodesic.py

+        optimizer = None
+        if optim_opts is not None:
+            if optim_opts == "Adam":
+                optimizer = torch.optim.Adam([geod._geodesic])


this is more a question than something that needs to be addressed for this PR: are Geodesic optimizers supposed to be initialized from a private variable? how would a user that isn't aware of the private variable initialize the optimizer from the loaded synthesis?

billbrod · 2025-03-10T16:37:50Z

I took a closer look at the tests, and in general they all look good. I just had a couple of comments/questions:

there are a few instances where some parameters aren't being explicitly tested (usually _allowed_range) for a mismatch. however, this parametrize was not added in this PR, so maybe there's a reason they're not being tested?

No, this is just a mistake. I'll add tests for those.

why is the geodesic optimizer being initialized with a private variable? I've also noticed the warning about Geodesic not being robust enough, so it might not be used very often anyway, so maybe this isn't a general issue?

Geodesic is deprecated and about to be removed (it will live over in https://github.com/plenoptic-org/geodesics for the time being). But this is an annoyance -- for most objects, the public version of the to-be-optimized tensor is the same as the private (metamer / _metamer, _mad_image / mad_image), I just use the property to avoid users accidentally overwriting it. However, geodesic is the concatenation of image_a, _geodesic, image_b. The geodesics consist of N different images. Two of those are the endpoints (image_a, image_b), which are set by the user at initialization and never change. The remaining N-2 are stored as _geodesic, are the transition between those two endpoints and are what we change during optimization. Thus, when you initialize the optimizer, it needs to be _geodesic, because if you used geodesic, you'd be changing the endpoints. Does that make sense? geodesic is a bit of a convenience variable here, because any time you visualize or compute a diagnostic, you want to make sure you're including the endpoints.

I'm not sure the best way to handle that, because I don't want users to have to interact with a private variable. I suppose I could have a geodesic_with_endpoints or something. If you have a suggestion, I'm all ears, otherwise I might just punt on this for now and open an issue in the geodesic repo so I remember to figure it out at some point.

sjvenditto · 2025-03-10T16:53:52Z

Geodesic is deprecated and about to be removed (it will live over in https://github.com/plenoptic-org/geodesics for the time being). But this is an annoyance -- for most objects, the public version of the to-be-optimized tensor is the same as the private (metamer / _metamer, _mad_image / mad_image), I just use the property to avoid users accidentally overwriting it. However, geodesic is the concatenation of image_a, _geodesic, image_b. The geodesics consist of N different images. Two of those are the endpoints (image_a, image_b), which are set by the user at initialization and never change. The remaining N-2 are stored as _geodesic, are the transition between those two endpoints and are what we change during optimization. Thus, when you initialize the optimizer, it needs to be _geodesic, because if you used geodesic, you'd be changing the endpoints. Does that make sense? geodesic is a bit of a convenience variable here, because any time you visualize or compute a diagnostic, you want to make sure you're including the endpoints.

Yeah I noticed that the geodesic property was not the same as _geodesic, which is why I asked!

I'm not sure the best way to handle that, because I don't want users to have to interact with a private variable. I suppose I could have a geodesic_with_endpoints or something. If you have a suggestion, I'm all ears, otherwise I might just punt on this for now and open an issue in the geodesic repo so I remember to figure it out at some point.

My suggestion is also to punt it, especially since Geodesic is deprecated. My thought it that since it's a riskier synthesis method, users that take that risk should be familiar enough with it to know how to initialize the optimizer, whether that is taking the correct slice of geodesic or using the private variable. This can be sorted out once Geodesic becomes more stable

sjvenditto

I've added some more minor comments on the save and load docstrings

sjvenditto · 2025-03-10T16:55:46Z

docs/synthesis.rst

@@ -82,6 +82,11 @@ Furthermore:
  of the reference metric in a list, ``_reference_metric_loss``, but the
  ``reference_metric_loss`` attribute converts this list to a tensor before
  returning it, as that's how the user will most often want to interact with it.
+* All attributes should be initialized at object initialization, though they can
+  be "False-y" (e.g., an empty list, ``None``). At least one attribute should be
+  ``None`` or an empty list at initialization. which we use when loading to


Suggested change

``None`` or an empty list at initialization. which we use when loading to

``None`` or an empty list at initialization, which we use when loading to

sjvenditto · 2025-03-10T16:58:05Z

src/plenoptic/synthesize/geodesic.py

@@ -494,13 +503,14 @@ def load(
    ):
        r"""Load all relevant stuff from a .pt file.

-        This should be called by an initialized ``Geodesic`` object -- we will


Why was this removed from the docstring? A similar message was left in place for Eigendistortion and MAD

sjvenditto · 2025-03-10T17:00:42Z

src/plenoptic/synthesize/metamer.py

@@ -426,17 +436,19 @@ def load(
    ):
        r"""Load all relevant stuff from a .pt file.

-        This should be called by an initialized ``Metamer`` object -- we will


Similar to Geodesic, why was this removed?

sjvenditto · 2025-03-10T17:04:06Z

src/plenoptic/synthesize/synthesis.py

+            them as tuples of (name, input_names, outputs). On load, we check that the
+            initialized object's name hasn't changed, and that when called on the same
+            inputs, we get the same outputs. Intended for models, metrics, loss
+            functions. Used to avoid saving callable, which is brittle and unsafe.


Suggested change

functions. Used to avoid saving callable, which is brittle and unsafe.

functions. Used to avoid saving callables, which is brittle and unsafe.

billbrod added 13 commits February 11, 2025 12:16

update type hinting for lr scheduler

228d942

mark known things as safe

dd8c3be

marks more things as safe

c5180a3

Revert "marks more things as safe"

824d80e

This reverts commit c5180a3.

Revert "mark known things as safe"

bad3fa8

This reverts commit dd8c3be.

updates synthesis/metamer save/load

fb93f2f

updates so metamer save/load tests pass

daab5ec

updates eigendistortion save/load

bd57524

updates MAD save/load

f11c3ef

updates geodesic save/load

2e93705

udpates io attrs behavior

e7d1405

don't create new tensors to check against, use existing ones

make them lists instead of generators

9f63db5

remove ruff from ci.yml

2f2c134

because it's handled by the pre-commit action

sjvenditto reviewed Feb 27, 2025

View reviewed changes

billbrod mentioned this pull request Feb 28, 2025

Plenoptic pyOpenSci/software-submission#150

Open

30 tasks

removes save_objects/weights_only flags

2b76b8e

pin sphinx<8.2

fe175a4

billbrod added 7 commits March 3, 2025 13:54

adds examine_saved_synthesis

5841dce

more useful info

add explicit checks for attributes

4df0dae

removes unnecessary check

7345da2

this is now handled earlier

adds check_tensor_equality helper function

2f07884

private function to check whether tensors have same device, size, dtype and values ( in that order )

adds new metamer save/load tests, updates files to pass

a0b8ece

adds new mad tests

6a8de97

adds eigendistort tests

a56ebd2

billbrod added 2 commits March 6, 2025 11:16

adds geodesic tests

6f7ef3f

also makes sure geodesic supports models with 3d outputs

adds explicit checks for supporting 3d and 4d models

b1c30d8

billbrod added 4 commits March 6, 2025 11:39

adds sphinx versionchanged directive

8d9d3c0

fixes issues with failing TestEqualityChecks

005dba6

tests were improperly formatted, was apparently using curie for einstein image

corrects saved/init error message

7ee2983

BalzaniEdoardo requested changes Mar 6, 2025

View reviewed changes

billbrod added 9 commits March 7, 2025 14:00

fixes failing tests

d0424a7

use SGD for test_resume_synthesis

eb6cd3e

there's something with Adam

renames check_attr_for_new to empty_on_init_attr

ec6d438

Adds tests for examine_saved_synthesis

122272a

adds note about initializing attributes, moves list comp to superclass

67e8bc7

separates out nested loop

4050444

makes loading value checks more strict

d832b68

since we're requiring dtype to be the same, shouldn't be an issue

removes initial_sequence property

d5260f8

undoes removing property

96fb10c

BalzaniEdoardo self-requested a review March 7, 2025 21:47

BalzaniEdoardo approved these changes Mar 7, 2025

View reviewed changes

sjvenditto reviewed Mar 10, 2025

View reviewed changes

sjvenditto requested changes Mar 10, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Updates save/load behavior for breaking change in torch 2.6 #317

Updates save/load behavior for breaking change in torch 2.6 #317

billbrod commented Feb 27, 2025 •

edited

Loading

billbrod commented Feb 27, 2025

sjvenditto Feb 27, 2025

billbrod Feb 27, 2025

sjvenditto Feb 28, 2025

billbrod Feb 28, 2025

sjvenditto Feb 27, 2025

billbrod Feb 28, 2025

billbrod commented Feb 28, 2025

billbrod commented Feb 28, 2025

billbrod commented Feb 28, 2025

codecov bot commented Feb 28, 2025 •

edited

Loading

billbrod commented Mar 6, 2025 •

edited

Loading

BalzaniEdoardo left a comment

BalzaniEdoardo left a comment

sjvenditto left a comment

sjvenditto Mar 10, 2025

sjvenditto Mar 10, 2025

billbrod commented Mar 10, 2025 •

edited

Loading

sjvenditto commented Mar 10, 2025

sjvenditto left a comment

sjvenditto Mar 10, 2025

sjvenditto Mar 10, 2025

sjvenditto Mar 10, 2025

sjvenditto Mar 10, 2025

	``None`` or an empty list at initialization. which we use when loading to
	``None`` or an empty list at initialization, which we use when loading to

	functions. Used to avoid saving callable, which is brittle and unsafe.
	functions. Used to avoid saving callables, which is brittle and unsafe.

Updates save/load behavior for breaking change in torch 2.6 #317

Are you sure you want to change the base?

Updates save/load behavior for breaking change in torch 2.6 #317

Conversation

billbrod commented Feb 27, 2025 • edited Loading

billbrod commented Feb 27, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

billbrod commented Feb 28, 2025

billbrod commented Feb 28, 2025

billbrod commented Feb 28, 2025

codecov bot commented Feb 28, 2025 • edited Loading

Codecov Report

billbrod commented Mar 6, 2025 • edited Loading

BalzaniEdoardo left a comment

Choose a reason for hiding this comment

BalzaniEdoardo left a comment

Choose a reason for hiding this comment

sjvenditto left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

billbrod commented Mar 10, 2025 • edited Loading

sjvenditto commented Mar 10, 2025

sjvenditto left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

billbrod commented Feb 27, 2025 •

edited

Loading

codecov bot commented Feb 28, 2025 •

edited

Loading

billbrod commented Mar 6, 2025 •

edited

Loading

billbrod commented Mar 10, 2025 •

edited

Loading