Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix pickling of states and distributions #305

Merged
merged 12 commits into from
Feb 27, 2024
Merged

Conversation

daniel-klein
Copy link
Contributor

Not pretty, but seems to work. Would appreciate a review. Should close #257 and now we can run MultiSim!

return dct

def __setstate__(self, state):
self.__init__(state['_gen'], state['rng'])
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that it's not great form to call __init__ from __setstate__, in case init changes in some unexpected way. Better practice could be to copy the relevant guts out of init to a separate function that's call both by __init__ and __setstate__, but meh?


def __getstate__(self):
slots_dict = {s: getattr(self, s) for s in self.__slots__ if hasattr(self, s) and getattr(self, s) is not None}
return (self.__dict__, slots_dict)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is basically reimplementing __getstate__... because I don't know how to call the default state getter. Ideas?

for st in state:
for k, v in st.items():
setattr(self, k, v)
return
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, basically just the default state setter because I'm not sure what else to do.

sims = ss.MultiSim([ss.Sim(pars, label='Sim1'), ss.Sim(pars, label='Sim2')])
sims.run()
s1, s2 = sims.sims
assert np.allclose(s1.summary[:], s2.summary[:], rtol=0, atol=0, equal_nan=True)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched from parallel to MultiSim, now that we have that functionality.

Copy link
Contributor

@cliffckerr cliffckerr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, but let's see if @RomeshA or @kaa2102 have any thoughts on extra Python wizardry to do here :) Otherwise, I think we can refactor further if other things break in future. Working is a big step up from not working!

@RomeshA
Copy link
Contributor

RomeshA commented Feb 26, 2024

I was a bit unsure what the UIDArray.__getstate__ and UIDArray.__setstate__ are needed for, is there a test case that fails if they are omitted? Is it perhaps related to the new addition of UIDArray.__getattr__()? I think something isn't quite right with that, if I run

import starsim as ss
import sciris as sc
s = ss.Sim()
s.initialize()
s2 = sc.dcp(s)

then states like s2.people.female are turned into just plain arrays. @cliffckerr do we really need to override that function entirely? It feels like it might open us up to a wide range of weird side effects further down the track

@daniel-klein
Copy link
Contributor Author

@RomeshA - these changes are intended to enable parallel processing, e.g. MultiSim. Initially, there was a problem with forking and collecting the ScipyDistributions that prevented us from even testing parallel processing. But with those issues addressed, I encountered all sorts of issues with states, for example when results are pickled in returning from a completed sim. Try running test_parallel in test_simple.py.

@cliffckerr
Copy link
Contributor

@RomeshA To fix that particular issue, we could change it to

    def __getattr__(self, attr):
        """ Make it behave like a regular array mostly -- enables things like sum(), mean(), etc. """
        if attr in ['__deepcopy__', '__setstate__']:
            return self.__getattribute__(attr)
        else:
            return getattr(self.values, attr)

Buuuuut ... this is definitely ugly and probably not super performant, though not sure if this path is encountered enough for it to matter.

@cliffckerr
Copy link
Contributor

NB: on main, the copied sim can't be run. If you comment out getattr or replace it with the code above, the sim runs, but produces bizarre results (where are the recovered people going?!):

import starsim as ss
import sciris as sc

s = ss.Sim(pars=dict(diseases='sir', networks='random'))
s.initialize()

s2 = sc.dcp(s)

s.run()
s2.run()

s.plot()
s2.plot()

image
So in any case, something needs to be fixed.

@cliffckerr
Copy link
Contributor

@daniel-klein I pushed this change to getattr so you don't need the explicit getstate/setstate. I am baffled as to why it works, though, because I thought __getattr__() is only called if __getattribute__() fails, but this calls it for those special cases and it works?!?

@RomeshA
Copy link
Contributor

RomeshA commented Feb 27, 2024

Those changes above should fix @cliffckerr's example - the issue was that we were inadvertently deep-copying arrays that were supposed to be references to other existing arrays, so the discrepancy would occur because the first 10 time steps would update the inadvertently copied arrays, and then when dead agents are removed at t=10, the views were re-connected but because the proper arrays hadn't been used up to that point, they would still contain the original values (e.g., everyone susceptible). So the fix is to make sure that the arrays are re-linked when copied/unpickled

image

Copy link
Contributor

@RomeshA RomeshA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems to all work now with the latest set of changes!

@cliffckerr cliffckerr merged commit 02c20ab into main Feb 27, 2024
2 checks passed
@cliffckerr cliffckerr deleted the fix_distribution_pickle branch February 27, 2024 03:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ScipyDistribution can't be pickled
3 participants