Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bugfix for deepcopy / pickling discrete spaces #2378

Merged
merged 4 commits into from
Oct 17, 2024

Conversation

quaquel
Copy link
Member

@quaquel quaquel commented Oct 16, 2024

As highlighted in #2373, deepcopy (and by extension) pickle breaks on discrete spaces. As a fix, in Cell.__getstate__, I set "connections" to an empty dict. In DiscreteSpace.__setstate__, I rebuild the connections via DiscreteSpace._connect_cells(). This latter method already existed in Grid and VoronoiGrid, but not yet in Network. Thus, I moved the method stub to DiscreteSpace and implemented it for Network.

This also adds unit tests for everything. I first confirmed that the unit tests failed and reproduce the original error.

@quaquel quaquel requested review from Corvince and EwoutH October 16, 2024 17:58
@quaquel quaquel added the bug Release notes label label Oct 16, 2024
Copy link

Performance benchmarks:

Model Size Init time [95% CI] Run time [95% CI]
BoltzmannWealth small 🔵 +0.2% [-0.2%, +0.7%] 🔵 -0.3% [-0.5%, -0.1%]
BoltzmannWealth large 🔵 -0.0% [-0.8%, +0.7%] 🔵 -0.2% [-1.0%, +0.6%]
Schelling small 🔵 -0.2% [-0.5%, +0.2%] 🔵 +0.2% [-0.1%, +0.5%]
Schelling large 🔵 +0.7% [+0.1%, +1.3%] 🔵 -0.3% [-1.0%, +0.5%]
WolfSheep small 🔵 +0.1% [-0.3%, +0.5%] 🔵 -0.3% [-0.6%, -0.1%]
WolfSheep large 🔵 +0.9% [-0.7%, +2.2%] 🔵 +0.5% [-1.5%, +2.3%]
BoidFlockers small 🔵 +0.1% [-0.5%, +0.7%] 🔵 +1.7% [+1.1%, +2.4%]
BoidFlockers large 🔵 +0.4% [-0.9%, +1.4%] 🔵 +2.1% [+1.3%, +2.9%]

@quaquel
Copy link
Member Author

quaquel commented Oct 16, 2024

tests fail on 3.10 because super.__getstate__() was only added in 3.11.

@EwoutH
Copy link
Member

EwoutH commented Oct 16, 2024

tests fail on 3.10 because super.__getstate__() was only added in 3.11.

Finally a case in which having all these different CI runs pay off!

Easiest way would be to add a warning that the cell space only works with 3.11 and above.

SPEC 0 says we can drop 3.10 now. I say we do that directly after 3.0 is branched off, and require Python 3.11 for Mesa 3.1 and above.

@quaquel
Copy link
Member Author

quaquel commented Oct 16, 2024

Easiest way would be to add a warning that the cell space only works with 3.11 and above.

Using super here was me being lazy. I was surprised it worked (But not for __setstate__ strangely enough). Its only a few lines of code more to have it work also for 3.10.

@Corvince
Copy link
Contributor

Just to clarify something here, since I havent worked with getstate or setstate yet. This code changes that when I pickle a cell, its connections become an empty dictionary. And when I unpickle a CellSpace the connections the connections are restored. Is this correct? Wouldn't this mean if I pickle a cell by itself and unpickle it I lose the connections? Not sure how I feel about this. Its probably still reasonable for the time being, but it could also lead to very unexpected behaviors.

@quaquel
Copy link
Member Author

quaquel commented Oct 16, 2024

Yes, that is a major drawback of this fix. Cells can now only exist in DiscreteSpaces, and this bugfix results in an implicit coupling between the two. It might be good to document this explicitly. However, I don't see another option for resolving this recursion error.

We can't store the connections with the refs to the other cells in the return from Cell.__getstate__ because it causes the recursion error. Since the cell does not know the space to which it belongs, it cannot query this space to rebuild the connections.

@EwoutH
Copy link
Member

EwoutH commented Oct 17, 2024

I first confirmed that the unit tests failed and reproduce the original error.

Awesome, everything has drawbacks, of this helps now we can revisit later. Experimental 🚀🚀.

@EwoutH EwoutH added the experimental Release notes label label Oct 17, 2024
@quaquel quaquel merged commit 3054bac into projectmesa:main Oct 17, 2024
12 of 13 checks passed
@Corvince
Copy link
Contributor

Yes, that is a major drawback of this fix. Cells can now only exist in DiscreteSpaces, and this bugfix results in an implicit coupling between the two. It might be good to document this explicitly. However, I don't see another option for resolving this recursion error.

I still don't fully understand why the recursion error occurs in the first place, and only for large grids. The number of connections per cell stays the same, no?
I think we need to find the root cause first before we can find a good solution.

But I don't think it's very important right now. While conceptually and practically it's awesome that cells are independent of the space, I would estimate in 99% of use cases they are used with a space. And the remaining use cases would still require a workflow that involves pickling. And working in large grids. This is pretty much the definition of a rare edge case. YAGNI work.

@quaquel
Copy link
Member Author

quaquel commented Oct 17, 2024

I also don't fully understand it. It has something to do with how deepcopy/pickle tracks the number of calls. I have found various discussions online on large datastructures with many identical objects giving rise to this problem.

In our case, on my machine with a max recursion depth of 3000, the error triggers for a 19 * 19 grid (361 cells) but not for an 18 * 18 grid (324 cells). 19 * 18 (so 342 cells) also fails. The critical limit seems to be 332 cells, which I found through testing with a (1 * $X$) grid. I first thought it might also depend on the grid type (because the connections between cells cause the recursion in the first place). However, there is no difference between von Neumann (4 connections) and Moore (8 connections) regarding this 332 cell limit. Likewise, whether the grid is toroidal or not does not seem to matter for this threshold.

At this point, I gave up trying to dig deeper because, as you say, people will unlikely need cells outside of discrete spaces.

@EwoutH
Copy link
Member

EwoutH commented Oct 17, 2024

However, there is no difference between von Neumann (4 connections) and Moore (8 connections) regarding this 332 cell limit.

332 * (1 + 8) = 2988, which with a little overhead seems remarkably close. Almost strange this isn’t the case.

@quaquel
Copy link
Member Author

quaquel commented Oct 17, 2024

332 * (1 + 8) = 2988, which with a little overhead seems remarkably close. Almost strange this isn’t the case.

Exactly. I am really surprised that moore vs. neuman or toroidal does not matter as far as I can tell.

@quaquel quaquel deleted the setstate branch October 25, 2024 06:04
Comment on lines +215 to +217
def __getstate__(self):
"""Return state of the Cell with connections set to empty."""
# fixme, once we shift to 3.11, replace this with super. __getstate__
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@quaquel We're now requiring Python 3.11+, so this can be updated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Release notes label experimental Release notes label
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants