Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve correspondence between ObjectiveC objects and Python wrappers #256

Open
freakboy3742 opened this issue Jan 23, 2023 · 20 comments · May be fixed by #543
Open

Improve correspondence between ObjectiveC objects and Python wrappers #256

freakboy3742 opened this issue Jan 23, 2023 · 20 comments · May be fixed by #543
Labels
enhancement New features, or improvements to existing features.

Comments

@freakboy3742
Copy link
Member

What is the problem or limitation you are having?

In the process of building the Toga GUI testbed, we discovered #249. A fix for this problem was included in #246; however, that fix was really a safety catch for a specific set of circumstances. The bigger question: how did that set of circumstances happen in the first place?

At present, it is possible for an object to be disposed of by Objective C, but due to delays in propagating garbage collection, the ObjCInstance wrapping that object may not be immediately disposed of. This means the weak reference dictionary can contain stale objects.

#246 protected against this by protecting against the specific case where a memory address was reused - but if a Python object points at memory that has been disposed of and not reused, you'll generate a segfault. This is obviously undesirable.

Describe the solution you'd like

Functionally - it should be impossible to generate a segfault from Python code.

An idea posited by @mhsmith in the discussion about #249 was that we could maintain stronger reference to the ObjC object on the Python side. This would mean we add add a retain calls whenever a Python-side wrapper is created; release-ing only when Python has finished with the object. The idea here would be that it's better to have larger memory footprint (or even a small leak), than to have a segfault because a Python object has a stale memory reference.

Describe alternatives you've considered

Do nothing. The status quo is stable(ish?), and while it, may be theoretically possible to get a stale ObjCInstance object, in practice we're not seeing that problem (after #246 landed, anyway).

Additional context

Other than the general hesitation of fiddling with a complex area of code that appears to work at present - the complication to watch out for here is cyclic references between Python objects and Objective C objects that prevents Objective C from ever releasing memory. It's easy to avoid segfaults by never releasing... but eventually the memory will run out.

@freakboy3742 freakboy3742 added the enhancement New features, or improvements to existing features. label Jan 23, 2023
@mhsmith
Copy link
Member

mhsmith commented Oct 19, 2023

A similar segfault occurred due to some missing retain calls which were added by this commit.

@samschott
Copy link
Member

samschott commented Apr 1, 2024

I was revisiting this again due to beeware/toga#2468, and wanted to jot down some options before I forget them again.

Background

The changes from #201 were introduced as convenience to rubicon-objc users. We cannot use Automatic Reference Counting from Python, which means that objects created by alloc, new, copy or mutableCopy calls are never automatically released when no longer needed. Ensuring that memory is freed properly, we would need to explicitly release items when garbage collected, i.e., strictly adhere to https://developer.apple.com/library/archive/documentation/Cocoa/Conceptual/MemoryMgmt/Articles/mmRules.html. In practice, this was never done in the toga codebase, resulting in widespread memory leaks.

The affordance introduced in #201 was to always release objects on __del__ calls if we took ownership of them earlier, by either one of the described methods or by an explicit retain call. This solution still requires the user to:

  1. Ensure that there are no reference cycles crossing this Objc-C to Python boundary that cannot be detected by garbage collection.
  2. Manually retain and release objects which we need that are returned by any method other than the ones listed above. This was also not always done, leading to segfaults and other issues.

Possible options

  1. As described in Attribute intermittently returns the wrong object #249 (comment), always retain objects referenced by Python and release them on garbage collection. For objects returned by alloc, etc., we would then have a refcount of 2 and would need two release calls or more explicit releases by users of rubicon-objc. There may also be other corner cases to consider.
  2. Completely give up on magical conveniences and require strict adherence to memory management rules in toga. This goes back to the state before Autorelease objects which we own #201. The required ref-count aware programming does not come naturally to Python programmers so would likely lead to a re-emergence of memory leaks.
  3. Keep the status quo but systematically vet the toga-cocoa codebase to add retain calls where required.

@freakboy3742
Copy link
Member Author

FWIW - I think the option described by (1) (i.e., the #249) is worth exploring. (3) is working well enough, but is still prone to leaks (and occasional crashes); as you've noted (2) requires a set of skills that most Python programmers (and, to be brutally honest, most programmers period) either don't have, or can't exercise with 100% reliability. (1) has the benefit of being easy to explain conceptually, and should (?) be relatively straightforward to implement.

I have 2 core sources of concern around (1) as an approach:

  1. The lifecycle of objects where ownership is transferred from something in the ObjC runtime to the Python runtime (or vice versa). This is really the broader class of the alloc() case you've described.
  2. Cross-engine reference cycles. If the Python side retains a reference because it thinks the ObjC side is still using the object, and the ObjC side retains a reference because it thinks the Python side is retaining a reference, then we're going to end up with leaks. This isn't really any different to reference cycles in Python itself; we just need to be aware of how to detect and resolve them.

@samschott
Copy link
Member

Agreed with your broad assessment that option (2) is very hard to do without enforcing particular design patterns and potential tooling support. There is a reason why Apple strongly recommends ARC.

Regarding your concerns around #249:

  1. This does need some fleshing out. My initial idea is to just add an additional release call whenever such objects are deleted on the Python side, but this relies on always knowing when we are handed ownership. If this goes beyond a list of hard-coded method names / prefixes, this might not possible.
  2. Cross-engine reference cycles: Making Pythons Garbage Collection aware of ObjC objects will be a tall order, if feasible at all. I would tend toward leaving this as open problem, and put the burden on the Rubicon users to avoid such cycles by using weak properties.

@mhsmith
Copy link
Member

mhsmith commented Apr 2, 2024

My initial idea is to just add an additional release call whenever such objects are deleted on the Python side

Alternatively, how about this?

  • Call retain when the Python wrapper is created, unless it was created via one of the special method names.
  • Call release exactly once when the Python wrapper is destroyed in __del__.

That covers all the use cases I can think of, except for the pattern from beeware/toga#2468, where a failed init releases the object and returns NULL. Could we make Rubicon aware of this by supporting a more Pythonic syntax as described in #26: NSImage(contentsOfFile=path)? This would call alloc, then initWithContentsOfFile , and if the latter returns NULL, then NSImage(...) would call retain to cancel out the release in __del__, and then throw an exception.

Then, as long as we can assume that the plain init method always takes zero arguments, that could be used by writing NSImage().

This may clash with the way the Python constructor is currently being used to wrap an existing Objective-C object, but it should be possible to distinguish that mode by checking argument types, or by passing a keyword argument like _rubicon_something.

Cross-engine reference cycles: Making Pythons Garbage Collection aware of ObjC objects will be a tall order, if feasible at all. I would tend toward leaving this as open problem, and put the burden on the Rubicon users to avoid such cycles by using weak properties.

Yes, I think that's the only reasonable option.

@mhsmith
Copy link
Member

mhsmith commented Apr 3, 2024

There are also some type methods that don't follow the naming convention, like NSURL.URLWithString. As I understand it, the Rubicon user must manually release the object returned by such methods. Under this proposal, they would still need to do that, because the object would be returned by the method with a reference count of 1, and Rubicon would then increment that to 2.

How does Objective-C's automatic reference counting handle this? Does it depend on metadata in header files, and is that information available at runtime? Unless we can fix this completely, I think the documentation is a bit overoptimistic to say "You won’t have to manage reference counts in Python, Rubicon Objective-C will do that work for you."

@freakboy3742
Copy link
Member Author

freakboy3742 commented Apr 4, 2024

There are also some type methods that don't follow the naming convention, like NSURL.URLWithString. As I understand it, the Rubicon user must manually release the object returned by such methods. Under this proposal, they would still need to do that, because the object would be returned by the method with a reference count of 1, and Rubicon would then increment that to 2.

The baseline memory management rules described the conditions for API memory retention; as I understand it, NSURL.URLWithString falls under the "owning an object you create" clause - i.e., the "alloc, new, copy..." list is indicative, but isn't exhaustive; class-level "create and init" methods also fall under this umbrella.

As far as I can make out, we are currently leaking in the NSURL.URLWithString-style case unless you call .release(), .autorelease(), or you manually tweak _needs_release=True on an instance created with this method. A process that runs:

while True:
    gc.collect()
    NSURL.URLWithString("https://example.com")

seems to slowly leak memory (as measured by RSS allocated size); whereas:

while True:
    gc.collect()
    NSURL.alloc().initWithString("https://example.com")

seems much more stable over time. There's still a lot of growth in process size, but this seems to be recovered over time, whereas there's no (or, at least, much less) observable memory recovery in the former version.

In both cases, the number of cached objects (NSURL._cached_objects), and the number of objects held by the garbage collector is constant, which suggests the memory is being lost on the ObjC side.

So... it would appear that we definitely have work to do here.

The approach you've described of adding a retain() to every ObjCInstance doesn't quite work, because there's an implied retain already on any object you create - so, any alloc, new, etc method will either require an additional release, or avoid invoking retain on every instance. Of course, that will still fall into the NSURL.URLWithString hole, as there's no way to determine from the runtime whether a reference you've been given is owned.

The other issue I can see is with uses that currently require the use of autorelease. Idiomatically, autorelease is used when you have a method that acts as a factory for new objects, but doesn't retain ownership of that object; the receiving object is expected to then take ownership. We use this pattern in APIs like the macOS implementation of Table and Tree to populate tree nodes, and in DetailedList for popup menus. If create an ObjC object on the Python side, but release as part of the ObjCInstance wrapper, then the object will cease to have an ObjCInstance when it is returned, which should cause it to be released (or, at least, put it on a candidate list for release). I'm not sure if the best option here would be to autorelease rather than release on __del__; or if we need to retain an autorelease mechanism in the API for this case.

@mhsmith
Copy link
Member

mhsmith commented Apr 4, 2024

there's an implied retain already on any object you create - so, any alloc, new, etc method will either require an additional release, or avoid invoking retain on every instance.

Yes, that's why I said "Call retain when the Python wrapper is created, unless it was created via one of the special method names."

Idiomatically, autorelease is used when you have a method that acts as a factory for new objects, but doesn't retain ownership of that object; the receiving object is expected to then take ownership. We use this pattern in APIs like the macOS implementation of Table and Tree to populate tree nodes, and in DetailedList for popup menus.

The current implementation of those methods calls retain and autorelease before returning the object. That would continue to work under this proposal:

  • alloc would return an object with a reference count of 1, and ObjCInstance would leave it unchanged because of the special method name.
  • retain increases it to 2
  • Destroying the ObjCInstance on leaving the function reduces it to 1.
  • The calling code calls retain, increasing it to 2
  • The main loop iteration completes the autorelease block, reducing it to 1

I'm not sure if the best option here would be to autorelease rather than release on __del__

Maybe we can be more specific: what if Rubicon automatically does a retain/autorelease on any object which is returned to Objective-C? If we are transferring ownership, that will keep it alive long enough, and if we're not transferring ownership, it should be harmless.

Either way, any existing code that uses retain/release/autorelease correctly would continue to work, but many of those uses would now be redundant and could be removed when convenient.

@freakboy3742
Copy link
Member Author

there's an implied retain already on any object you create - so, any alloc, new, etc method will either require an additional release, or avoid invoking retain on every instance.

Yes, that's why I said "Call retain when the Python wrapper is created, unless it was created via one of the special method names."

🤦 My apolgies - you did.

Idiomatically, autorelease is used when you have a method that acts as a factory for new objects, but doesn't retain ownership of that object; the receiving object is expected to then take ownership. We use this pattern in APIs like the macOS implementation of Table and Tree to populate tree nodes, and in DetailedList for popup menus.

The current implementation of those methods calls retain and autorelease before returning the object. That would continue to work under this proposal:

  • alloc would return an object with a reference count of 1, and ObjCInstance would leave it unchanged because of the special method name.
  • retain increases it to 2
  • Destroying the ObjCInstance on leaving the function reduces it to 1.
  • The calling code calls retain, increasing it to 2
  • The main loop iteration completes the autorelease block, reducing it to 1

Looking closer at the Tree/Table/DetailedList code, I think the retain() calls aren't actually needed in those cases - they just need to be autorelease(). If you take the explicit retain() out, we end up with a final count of 0 - and a count of 0 when the ObjCInstance is garbage collected. This would be a closer match for the autorelease example given in the Cocoa memory management guide.

Making that situation work would require Rubicon to autorelease() the controlled object, rather than a release() when the ObjCInstance is garbage collected.

Alternatively, the ObjCInstance reference could be considered unrelated to the ObjC memory handling, and require full retain/release calls... but that seems like a bit of a headache.

I'm not sure if the best option here would be to autorelease rather than release on __del__

Maybe we can be more specific: what if Rubicon automatically does a retain/autorelease on any object which is returned to Objective-C? If we are transferring ownership, that will keep it alive long enough, and if we're not transferring ownership, it should be harmless.

I'm not certain that's true. If you have a method that is an accessor, rather than a factory, the memory handling requirements of return values are different.

Factory methods only need a retain() if the creation method doesn't have a magic name. However. if we always do a retain() on an accessor method, then every variable access will increase the retain count by 1, which will guarantee a leak.

An automatic .autorelease() on factory methods is what we actually want. An automatic autorelease on an accessor... that's the one I'm not clear on. I know it's safe to call autorelease on an object more than once - but I'm not sure if marking an arbitrary object as autorelease could have consequences elsewhere.

@mhsmith
Copy link
Member

mhsmith commented Apr 13, 2024

I think the retain() calls aren't actually needed in those cases - they just need to be autorelease()

I think they are needed with the current implementation: see beeware/toga#2482. OK, you're right; I didn't understand that calling autorelease disables the release in __del__.

if we always do a retain() on an accessor method, then every variable access will increase the retain count by 1, which will guarantee a leak

That's why I said retain/autorelease, not just retain.

But after thinking about all these examples, I agree it's simpler not to have any special-casing of a returned object, and just change the __del__ behavior from release to autorelease.

I'm not sure if marking an arbitrary object as autorelease could have consequences elsewhere.

The main effect is that any Objective-C object referenced from Python would have its lifetime extended until the end of the current main loop iteration. That should only cause a problem if something depends on the object's dealloc method being called earlier than that, which I imagine should be a lot less common than the problems we're trying to solve right now, and maybe would never be a problem at all.

As an escape hatch, we could still provide a way to do a more immediate release. For example, we could make ObjCInstance.release check the retainCount before calling the native method, and if it was 1, disable the autorelease in __del__ because the object has already been freed. But there's no point in implementing that until we need it.

@mhsmith
Copy link
Member

mhsmith commented Apr 13, 2024

as I understand it, NSURL.URLWithString falls under the "owning an object you create" clause - i.e., the "alloc, new, copy..." list is indicative, but isn't exhaustive; class-level "create and init" methods also fall under this umbrella.

A process that runs:

while True:
    gc.collect()
    NSURL.URLWithString("https://example.com")

seems to slowly leak memory

That doesn't prove whether URLWithString returns an autoreleased object or not, because it never gives a chance for the autorelease pool to be emptied.

And as far as I can see, the documentation of URLWithString looks just like that of constraintWithItem, which I believe you previously determined IS returning an autoreleased object.

@freakboy3742
Copy link
Member Author

That doesn't prove whether URLWithString returns an autoreleased object or not, because it never gives a chance for the autorelease pool to be emptied.

That's a good point... I guess I should also add a call to iterate one or more iterations of the Cocoa event loop (or possibly run the memory test as a callback in the event loop, to ensure that all the autorelease pool handling is a active).

And as far as I can see, the documentation of URLWithString looks just like that of constraintWithItem, which I believe you previously determined IS returning an autoreleased object.

No object is created autoreleased; but it can be created unowned - which is what happens with both constraints and the URLWithString constructor. The crash that was being addressed with the extra retain calls is really a different manifestation of the deeper issue we're trying to address in this ticket. The constraint object is being created without being owned; however, as part of cleaning up constraints, the Python code tries to use the object. Since there's no strong correspondence between the existence of the Python object and the underlying ObjC object, it's possible that the object that is referenced as a constraint has been disposed; and so we get a crash when Toga does a cleanup on constraints. Explicitly retaining and releasing the constraint means that the Python side retains a permanent reference to the constraint instance - which is what would ideally be implied by the existence of a Python-side ObjCInstance that hasn't been garbage collected.

If the code wasn't trying to explicitly remove constraints on the cleanup of the Container object, we should be able get away with just an autorelease on each constraint instance,

@samschott
Copy link
Member

Catching up

I've been trying to catch up on this discussion just now. Summarizing:

  1. The "alloc, new, copy..." list does seem to be exhaustive after all and objects returned by factory methods such as constraintWithItem do need to be explicitly retained.
  2. autorelease instead of release in __del__ would simplify the use case where we create an item and hand it over to ObjC for future ownership while deleting the Python reference. It would remove the explicit autorelease calls still required in Remove some over-enthusiastic memory retention. toga#2482.

@mhsmith's approach vs what we have

The approach described by #249 (comment) still makes a lot of sense to me and it builds on some concepts that we've already started to use in Rubicon:

An ObjCInstance on the Python side effectively "owns" the a single refcount in ObjC as long as the Python refcount is >= 1. When the Python instance is deleted, we release or autorelease this single refcount. This model is also why we set _needs_release = False if a user calls release or autorelease directly.

Currently, this lifecycle handling is only automatic if we take ownership through a "alloc", "new", etc, method. In other cases, Rubicon users will need to explicitly call retain but typically don't need to manually release. This makes the API a bit confusing.

Also summing up @mhsmith's proposal to for my own understanding: For every Rubicon method returning an ObjC object we check if we already have a corresponding ObjCInstance on the Python side, in some weakref registry (e.g., the existing ObjCInstance._cached_objects WeakValueDictionary).

  1. If yes, there is nothing to do because it's already retained.
  2. If no, we retain it and add it to the registry.

Our __del__ method would then always release or autorelease.

Open questions

I see the main complexity in ensuring that this lookup works properly since we cannot use Chaquopy's approach of JNI references and IsSameObject. Using memory addresses and class comparison is a decent workaround, but there may be some nasty race conditions when a memory address is reused by an object of the same type.

Storing a unique identifier on the ObjC side might also be an option, e.g., with objc_setAssociatedObject, but I'd need to think through this more carefully.

Besides that, there is also a question around whether we still want to allow manual retain, release and autorelease calls by Rubicon users after enabling all this magic, and if we want to provide users an opt-in / opt-out for memory managing their ObjC instances..

@mhsmith
Copy link
Member

mhsmith commented Apr 16, 2024

Thanks, this is a good summary.

Using memory addresses and class comparison is a decent workaround, but there may be some nasty race conditions when a memory address is reused by an object of the same type.

As long as the Python wrapper is removed from the WeakValueDictionary before its Objective-C reference is released, I think the address alone is a good enough key. I'm not sure if there's any guarantee of the relative order of an object's weakref callbacks and its __del__ method. But if __del__ uses autorelease rather than release, then we should be safe either way.

there is also a question around whether we still want to allow manual retain, release and autorelease calls by Rubicon users after enabling all this magic

I think we should keep them as an escape hatch in case we've missed something. It should be possible to implement this in such a way that any existing correct use of these methods would continue to be correct, even if redundant.

@freakboy3742
Copy link
Member Author

Thanks, this is a good summary.

Agreed - thanks.

To rephrase this it in terms high level requirements rather than implementation detail: It should be impossible to have a Python-side reference to an Objective C object that has been disposed. This should currently be true for any object created on the Python side by alloc et al; but it won't be true for any received reference, or instance created Python-side via non-alloc method.

Using memory addresses and class comparison is a decent workaround, but there may be some nasty race conditions when a memory address is reused by an object of the same type.

As long as the Python wrapper is removed from the WeakValueDictionary before its Objective-C reference is released, I think the address alone is a good enough key. I'm not sure if there's any guarantee of the relative order of an object's weakref callbacks and its __del__ method.

There's an edge case where a new ObjC instance of the same type is created and wrapped by Python before garbage collection has concluded on the original wrapper - this would cause the old Python wrapper to be re-used, so any Python-side is checks would return the same object. However, that's the same edge case that currently exists, and it's fairly niche; I'm reasonably comfortable handling this as a known edge case, and documenting the potential problem.

Using objc_setAssociatedObject is an interesting idea though - if we're storing a key that can be used to lookup the Python-side object (possibly literally the id of the Python object?), that avoids the ambiguity entirely, as an object that is freshly created on the ObjC side won't have an associated Python-side key.

But if __del__ uses autorelease rather than release, then we should be safe either way.

My concern here is whether there are any consequences to "always use autorelease". I think the only consequence is slightly deferred Obj C deallocation... but I'm not 100% sure on that.

A related question: does it make any difference if we call autorelease in the __del__, or if we invoke it when the Python wrapper is created?

there is also a question around whether we still want to allow manual retain, release and autorelease calls by Rubicon users after enabling all this magic

I think we should keep them as an escape hatch in case we've missed something. It should be possible to implement this in such a way that any existing correct use of these methods would continue to be correct, even if redundant.

Agreed. Longer term, we may want to consider making them no-ops, but at least in the short term, having extra manual (but balanced) retain/release calls shouldn't make things any less stable (although we should confirm that as part of testing of any patch).

@mhsmith
Copy link
Member

mhsmith commented Apr 17, 2024

There's an edge case where a new ObjC instance of the same type is created and wrapped by Python before garbage collection has concluded on the original wrapper - this would cause the old Python wrapper to be re-used

If the Python wrapper is removed from the WeakValueDictionary before its Objective-C reference is fully released, then it's impssible for any Objective-C object subsequently allocated at the same address to use the same wrapper. Calling autorelease in __del__ should be enough to ensure this.

Chaquopy doesn't have an equivalent to autorelease, but it achieves the same goal by wrapping the reference in a separate object with its own __del__ method. I've edited #249 (comment) to add more details of this.

My concern here is whether there are any consequences to "always use autorelease". I think the only consequence is slightly deferred Obj C deallocation... but I'm not 100% sure on that.

I guess the only way to answer this is to try it. The Toga test suite should be a pretty thorough check.

A related question: does it make any difference if we call autorelease in the __del__, or if we invoke it when the Python wrapper is created?

Not sure what you mean by this – wouldn't that cause the Python wrapper to become invalid if it was kept into the next main loop iteration?

@freakboy3742
Copy link
Member Author

There's an edge case where a new ObjC instance of the same type is created and wrapped by Python before garbage collection has concluded on the original wrapper - this would cause the old Python wrapper to be re-used

If the Python wrapper is removed from the WeakValueDictionary before its Objective-C reference is fully released, then it's impssible for any Objective-C object subsequently allocated at the same address to use the same wrapper. Calling autorelease in __del__ should be enough to ensure this.

🤦 Of course - too many cobweb the existing implementation in my mind.

A related question: does it make any difference if we call autorelease in the __del__, or if we invoke it when the Python wrapper is created?

Not sure what you mean by this – wouldn't that cause the Python wrapper to become invalid if it was kept into the next main loop iteration?

Ignore me - this is another cobweb in my brain (the same cobweb that led to the retain/autorelease pair we recently removed from Toga).

@samschott
Copy link
Member

Coming back this issue after looking at #539, it seems to me that we have two separate problems:

  • Allowing proper memory management from Python without risking segfaults (keeping on ObjC object alive until there are no Python references left).
  • Fixing wrong cache hits from Attribute intermittently returns the wrong object #249. The ObjC class name check addresses this partially, but still leaves the door open the the address being reused by a new object of the same type. This won't cause segfaults or attribute errors, but could lead to otherwise unexpected behavior on the Python side.

If we have the need to reliably identify the actual ObjC instance for which we cached a Python instance, could we potentially store some additional info, e.g., a UUID, with the ObjC instance? For example using objc_setAssociatedObject?

@mhsmith
Copy link
Member

mhsmith commented Nov 18, 2024

If we keep the ObjC object alive until there are no Python references left, then wrong cache hits will no longer be possible, because the address could never be reused as long as its corresponding Python object exists. All the other workarounds will then become unnecessary, including the class name check.

@samschott
Copy link
Member

Fair point! There might be some value of having a cache key that more directly corresponds to what we are caching, independent of guarantees about the lifecycle of the Python-wrapper or the corresponding ObjC object. But its definitely not required if such guarantees exist.

@samschott samschott linked a pull request Nov 19, 2024 that will close this issue
7 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New features, or improvements to existing features.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants