-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH Implement LoadContext to handle multiple instances #209
ENH Implement LoadContext to handle multiple instances #209
Conversation
I think this looks pretty good, like the simplest way to solve the problem.
This would mean that we need to add something like this everywhere, right? def get_instance_foo(state, load_state):
saved_id = state.get("__id__")
if saved_id and saved_id in load_state.memo:
# an instance has already been loaded, just return the loaded instance
return load_state.get_instance(saved_id)
...
if saved_id:
load_state.memoize(loaded_obj, saved_id)
return loaded_obj So is it something we want to move into a decorator to avoid the boilerplate?
Let's leave it like this for now and bother about the problem later.
I only took a glance but I hope we can manage to solve it without such a solution. |
Thankfully no, because of the way I structured it, the memo check is at the top level We could switch to a decorator pattern, but I felt it would be easier to just pass
I hope there's a nice solution I've missed too, but it feels like once you start getting into edge cases with nested dependencies and such, you do need some kind of DAG implementation... |
Ah yes, nice, I missed that.
Okay, this might be something we would want to move out of the
Hmm, not sure, let's see. Maybe we can learn how pickle implements it and copy that approach. |
I've added in a decorator that can be used to wrap get_state functions that we want to be able to persist as single instances. I initially tried having it in the high level |
Interesting, do you still have the code for this? It's not clear to me why moving to a decorator solves this. Assuming that code looks similar to what you have now: result = func(obj, save_state)
result["__id__"] = id(obj) did you try if this works instead? __id__ = save_state.memoize(obj)
result = func(obj, save_state)
result["__id__"] = __id__ |
What worked was less just moving to a decorator, more not persisting certain types with an id. When we start to persist
Some tests start failing in a flaky way (sometimes they would work fine on reruns). I jumped into the debugger and saw that objects that shouldn't have the same id did, and it seemed to be objects that were being created during My intuition was that CPython might be garbage collecting these, then reusing the memory address, giving the same id. I'll play with what you suggested tonight, and might try using a more unique ID for objects as well :) |
I think this is the most likely explanation, which is why I suggested to memoize the object first. This should prevent gc and make the |
I suppose either method would work, but memoizing every object would mean basically never garbage collecting,
That should work, although it would also mean holding a lot more things in memory at all times, which might give a minor performance hit. If we don't need to actually hold the object, just get a unique |
The memory is cleared once dumping is over, so gc should work after that. But it's true that during the dumping process, we could potentially see a memory spike, though I assume that most objects are held in memory anyway. If I'm not missing something, It should only really matter for attributes that are generated on the fly and require a lot of memory. |
No that's true, I'm overthinking it and these on-the-fly objects being held in memory should be a negligable memory hit. I'll rework it this eve and check if it helps solve the flaky tests |
…skops into FIX-multiple-instance-persistance
Memoizing seems to have solved the problem without needing any UUID rework, good call @BenjaminBossan! I've added a test that checks it works for a few different object types, let me know if there's anything else you think would be worth testing! |
RE: CodeCov, as far as I can tell from the report, this PR is at 100% coverage, and just failing on an "indirect coverage change", so I think the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks very good to me, I just have some minor comments/questions.
Regarding the naming, it's a bit unfortunate that we now have state
and LoadState
. When adding SaveState
, I didn't think about this issue. Maybe we can find a better name (for both for consistency)?
On the names, how do you feel about changing them to |
Yes, I prefer that. Pinging @adrinjalali wdyt? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot, fantastic work.
@adrinjalali would you please also give this a look?
aeb5b12
to
df3848a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other than the minor comments, looks brilliant.
…skops into FIX-multiple-instance-persistance
Thanks for the PR reviews guys ❤️ I think it should be ready to merge in! |
Fixes #206.
PR to add ability to persist a single instance that is referenced in multiple places.
Notes:
Currently, this does save the state of the object multiple times, but only initialises once. This is done due to the issue discussed in Add Method type in dispatch calls #195 around not knowing which instance will be loaded first, particularly in complex cases with interwoven dependencies.
At the moment, I've only addedThis is now global, done at the high level__id__
in the places needed to solve the xfail test, but if others are happy with how this looks I will expand this to otherget_state_*
methods :)get_state
andget_instance
functions so works for all types.Regarding 1: I looked into any ways to store DAGs in JSON and bumped into a relatively new addition to JSON syntax, JSON-LD, which also happens to have a Python implementation. I feel like if we want to only save these objects once, we would need some kind of DAG implementation to hold dependencies. This, however, would need a major rework and feels like it might be overkill for the time being, but it's worth holding in mind.