Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Autorelease objects which we own #201

Merged
merged 22 commits into from
Jan 21, 2021
Merged

Conversation

samschott
Copy link
Member

@samschott samschott commented Jan 10, 2021

This PR aims to fix #200 and #48 by autoreleasing any obj-c instances which we own (created with a method which starts with "alloc", "copy", "mutableCopy", or "new").

The implementation goes as follows:

  1. When an object is returned from a objc method call which qualifies, the object is marked with _needs_release = True.
  2. Just before the Python instance is deleted, autorelease is called on the objc instance.

I've opted to use autorelease instead of release because the latter would result in occasional segfaults when the objc-instance is still needed while the Python instance has been destroyed. This is the case for some attributes of the cell views in a toga.Table.

Similarly, autorelease is not called directly when the instance is created because this would result in some Python variables pointing to a released objc instance.

@dgelessus, is this in line with what you envisioned? The workaround with the _needs_release is a bit of a hack but it seems necessary to me because ObjCInstance itself does not know which method created it. Any better ideas are welcome!

PR Checklist:

  • All new features have been tested
  • All new features have been documented
  • I have read the CONTRIBUTING.md file
  • I will abide by the code of conduct

@samschott
Copy link
Member Author

samschott commented Jan 10, 2021

Oh, and it looks like the test suite is hitting a segfault somewhere, but I am note sure where. The toga examples all seemed to run just fine...

Edit: I cannot reproduce the segfault locally, all tests pass on my machine with macOS 11.1 and Python 3.9. This is however quite different from macOS 10.15 and Python 3.5 used for the smoke tests. Any idea of what might be going wrong? Otherwise I can set up a VM with macOS 10.15.

@samschott
Copy link
Member Author

samschott commented Jan 11, 2021

A further update on the segfault: It always occurs in test_core but the actual method which triggers it is not reproducible. The failure is also specific to Python 3.5 and does not occur on Python 3.7 3.6 to 3.9 (all tested on macOS 11.1).

Copy link
Collaborator

@dgelessus dgelessus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for working on this! The implementation looks logical and straightforward to me.

The workaround with the _needs_release is a bit of a hack but it seems necessary to me because ObjCInstance itself does not know which method created it.

I wouldn't worry about that - I think an extra marker like this is unavoidable here, because as you say there's otherwise no way to tell if an object came from an alloc/etc. call and still has an extra reference that needs to be freed.

I've opted to use autorelease instead of release because the latter would result in occasional segfaults when the objc-instance is still needed while the Python instance has been destroyed. This is the case for some attributes of the cell views in a toga.Table.

Hm, I'm not quite sure about this one. autorelease is definitely the safer option, but in most cases a normal release should work fine too. autorelease is (as far as I know) only necessary when writing a factory method that returns a newly allocated object, but without an extra reference like alloc/etc. For this to work, the returned object needs to be autoreleased and not released, because otherwise it would already be deallocated by the time it's returned from the method.

An advantage of using release is that it makes reference management problems easier to detect and debug. If you release an object too early, it will be deallocated right away, and any code that still uses it will fail relatively quickly. However, if you autorelease too early, the object remains usable at first, and you'll notice only after the end of the current autorelease pool if the object is still incorrectly referenced anywhere.

Rubicon needs to properly support autorelease of course, but I think it would be better if by default __del__ uses release, with the option to make it use autorelease instead for certain objects. Not sure how this would be best implemented API-wise.

it looks like the test suite is hitting a segfault somewhere

A further update on the segfault: It always occurs in test_core but the actual method which triggers it is not reproducible. The failure is also specific to Python 3.5 and does not occur on Python 3.7 to 3.9 (all tested on macOS 11.1).

Interesting, especially that it only happens on Python 3.5. I don't have any solid ideas here and haven't had the time to properly test this myself, so I'm mostly just guessing what could go wrong.

For the record, it can't have anything to do with processor architecture differences, because the crash also happens on GH Actions under macOS 10.15, and that macOS version only supports x86_64 (too new for i386 and too old for arm64).

A weird thing I ran across is how __del__ interacts with weak references: when CPython deallocates a Python object with a __del__ method, any weak references pointing to that object are only cleared after the __del__ call has finished. This could cause issues when multiple threads use rubicon-objc at the same time. If ObjCInstance.__del__ is called on one thread (which then calls release/autorelease) and another thread at the same time looks up that same object by its pointer (which retrieves the existing instance from ObjCInstance._cached_objects), the second thread would receive an ObjCInstance which was already released (or is currently being released or has been marked for autorelease) on the Objective-C side.

Another possibility is that there's a reference management bug somewhere in the rubicon-objc code that we simply didn't notice before, because newly alloced objects never had their original reference released. I don't have any concrete ideas where that might happen though - probably somewhere in the unit test code itself, because I'm not sure if the main rubicon.objc code ever allocs anything.

This is one case where using release instead of autorelease in __del__ could help with debugging 🙂

I also noticed that apparently these segfaults don't trigger Python's faulthandler, which we explicitly enable in our unit tests. I think this never happened to me before - faulthandler is usually quite reliable and helpful with crashes in native code. Maybe it would help to move the faulthandler.enable() call further up in the init.py, so that it gets enabled before rubicon.objc is imported?

Also, if you get a problem report popup from macOS when you reproduce the crash locally, can you copy the crash report and post it here (or as an attachment or Gist, if it's long)? Even though the native macOS crash report usually isn't as helpful as the Python faulthandler stack trace, it's better than having no information about the crash at all.

rubicon/objc/api.py Outdated Show resolved Hide resolved
rubicon/objc/api.py Outdated Show resolved Hide resolved
SamSchott and others added 2 commits January 12, 2021 11:35
Co-authored-by: dgelessus <dgelessus@users.noreply.github.com>
Co-authored-by: dgelessus <dgelessus@users.noreply.github.com>
@samschott
Copy link
Member Author

samschott commented Jan 12, 2021

Here is the macOS crash report: rubicon-objc-py35-crash.txt I'll try switching to release and enabling the fault handler earlier and report back if this gives any more information.

Regarding the use of autorelease instead of release: I was trying to avoid a very particular error which may indeed be due to bad memory management on toga's side. Admittedly, this is code which I have written myself here. It creates a NSTextField instance and assigns it to a property of NSTableCellView. The text field never actually gets assigned to a Python variable. This therefore triggers __del__ and release immediately after creating it and later results in errors similar to:

Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/rubicon/objc/api.py", line 270, in __call__
    result = self.py_method(py_self, *args)
  File "/usr/local/lib/python3.9/site-packages/toga_cocoa/widgets/internal/cells.py", line 39, in setup
    self.textField.cell.lineBreakMode = NSLineBreakMode.byTruncatingTail
AttributeError: 'NoneType' object has no attribute 'cell'

The solution to this would be to always assign objc instances to Python variables and make sure they do not get garbage collected. This could be a bit tedious.

@samschott
Copy link
Member Author

Alas, neither enabling the faulthander earlier, switching to release, or using send_message helped with identifying the cause of the segfault. The behaviour remains the same. I'll have to look at the actual test code more closely to see what might be going wrong. Hopefully some time this week.

@dgelessus
Copy link
Collaborator

Here is the macOS crash report

Welp, the stack trace in that crash report is not very helpful - I'm guessing the stack got destroyed before/during the crash.

Regarding the use of autorelease instead of release: I was trying to avoid a very particular error which may indeed be due to bad memory management on toga's side. Admittedly, this is code which I have written myself here. It creates a NSTextField instance and assigns it to a property of NSTableCellView. The text field never actually gets assigned to a Python variable. This therefore triggers __del__ and release immediately after creating it

The problem here is that NSTableCellView's imageView and textField properties are declared as assign, which basically means that the property getters/setters don't do any reference management. This is actually quite unusual for object properties in (relatively) modern Objective-C code. Usually you would declare such properties as weak instead, to make it explicit that the object in the property is not retained, and to ensure safe behavior when the object gets deallocated while the property still points to it. From the user's point of view, assign and weak behave basically the same though - the important part is that such properties don't retain the objects they store, so the code that sets the properties needs to ensure that the objects are retained some other way.

I searched a bit, and according to this SO question, this problem with NSTableCellView happens even in pure Objective-C if you don't use Interface Builder (which seems to be the recommended way to set these properties) and aren't careful with your reference management. One comment there suggests:

The key to make it work is adding text-view and image view as subviews of the NSTableCellView.

Toga does this already, but only after setting the properties, meaning that the objects are already deallocated by that point. This can probably be fixed as you say by storing the views in local variables, so they are kept retained during the entire method. Though to be safe, I would also add them as subviews as quickly as possible, to ensure that they are also retained that way.

@samschott
Copy link
Member Author

@dgelessus, thanks for the analyses of the issue with NSTableCellView. I should have read Apple's developer docs more carefully! In this case, there really is nothing stopping us from using release instead of autorelease.

@samschott
Copy link
Member Author

samschott commented Jan 15, 2021

I think I have found the problem with the failing tests: the tests which would trigger a segfault are test_objcclass_requires_class, test_objcprotocol_requires_protocol and test_objcmetaclass_requires_metaclass. All three explicitly release an object, sometimes resulting in a segfault when that object is garbage collected.

Don't ask me why this only is an issue in Python 3.5, or why some of the methods still passed sometimes. Edit: Maybe it has something to do with the timing of garbage collection and if pytest would still register a test as passed before the crash.

@samschott samschott marked this pull request as ready for review January 15, 2021 00:46
@samschott
Copy link
Member Author

Hm, it looks like the macOS compatibility test failed to start.

@dgelessus
Copy link
Collaborator

I think I have found the problem with the failing tests: the tests which would trigger a segfault [...] explicitly release an object

Ah, that explains it, nice catch! Lucky that this caused noticeable segfaults on Python 3.5 at least so we didn't miss this completely. Not sure why it doesn't crash on later Python versions - probably as you say because of some change to when/how CPython deallocates objects and runs the garbage collector.

On that note though - I wonder if it would be good to implement some compatibility handling so that old code like this (that correctly releases extra references) doesn't break because of this change. Maybe ObjCInstance should implement a custom release method (only in Python, not in Objective-C with @objc_method) that calls the Objective-C release method, but also sets self._needs_release = False, so that __del__ won't release the object a second time. If I'm thinking correctly, this way the extra reference will be released correctly whether or not you explicitly call release.

Hm, it looks like the macOS compatibility test failed to start.

Strange, not sure what's going on there. I see two runs for "CI / macOS compatibility test (pull_request)", one that failed (where GH Actions just shows "This check failed") and one that ran normally and succeeded. I'm guessing it restarted the job automatically, but remembered the failed job and for some reason considers the entire CI failed as a result. I'm guessing it's just a temporary failure - I'll restart the CI jobs.

@samschott
Copy link
Member Author

Maybe ObjCInstance should implement a custom release method (only in Python, not in Objective-C with @objc_method) that calls the Objective-C release method, but also sets self._needs_release = False, so that del won't release the object a second time.

Yes, I think that's a good idea. It's also nice because it gives the user the option to explicitly release an object instead of implicitly by deleting all Python references (dangerous as this may be).

Another thing which is missing IMO is documentation. I'll try to familiarise myself with the rubicon-objc docs and see where a paragraph about memory management would fit it.

@samschott
Copy link
Member Author

Looking a the docs, the rubicon.objc.api reference would probably be the appropriate place for notes on memory management. However, this might be a bit buried. What do you think?

@freakboy3742
Copy link
Member

@samschott Nice work - I'll leave it to @dgelessus to give final approval, but this looks like a reasonable and easily explained solution to a non-trivial problem.

As for docs: the API docs are the right place to literally document that the release API exists; however, I'd be inclined to suggest that what we need is a new HowTo on memory management. It's a sufficiently important topic that it warrants a good explanation - especially given that someone from a Python background may not have a good handle on memory management in general, let alone the Objective C 'retain and release' model. It doesn't need to be an extensive essay on the topic - something that lays out the basic rules and common patterns of API usage should be sufficient.

@dgelessus
Copy link
Collaborator

What @freakboy3742 said 🙂 At the moment we unfortunately don't have any proper docs about how to do memory management for Objective-C objects with Rubicon, so there's no obvious place in the docs where this new behavior could be documented. Agreed that we probably need a new short howto for Objective-C memory management, to explain when you do/don't have to call retain and release when using Rubicon.

re. the changes to the tests: I think it would be better to not explicitly do del random_obj and instead let the object be deleted normally at the end of the method. Almost all code that uses Rubicon will not explicitly del objects when it's done using them, so we should write the tests the same way.

Actually, the new custom release method should also have some dedicated tests, to ensure that code that explicitly calls release still works. Probably at least a test for releasing an object that came from .alloc().init() and one for retaining and then releasing an object that didn't come from .alloc().init().

@samschott
Copy link
Member Author

Agreed on all points.

When designing the tests, I was wondering how we should handle retain calls. Should we also provide a custom retain method which sets _needs_release = True so that it will be automatically released for the user later? Or should we assume that when a user manually retains an object, they'll also manually release it later? This would give them the option to retain an object beyond the lifecycle of the Python variable.

Copy link
Collaborator

@dgelessus dgelessus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new how-to and the tests look good - my comments in the docs are just about minor formatting things. On the new Python-side retain method I've left a longer comment, because it currently doesn't handle some cases correctly and I'm wondering how that should be handled best.

docs/how-to/memory-management.rst Outdated Show resolved Hide resolved
docs/how-to/memory-management.rst Outdated Show resolved Hide resolved
Comment on lines 666 to 667
self._needs_release = True
result = send_message(self, "retain", restype=objc_id, argtypes=[])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This unconditionally calls retain on the Objective-C side every time retain is called from Python, but the boolean _needs_release attribute can only track one of those retain calls (or none at all, if the object came from alloc/etc. and so already has _needs_release set to True). This means that if you retain an object multiple times from Python, __del__ will not release all of those references, so you instead need to manually call release for every retain call.

This could be solved by replacing _needs_release with an integer reference counter, so that multiple retain calls can be remembered and __del__ can call release the correct number of times.

Though I'm wondering if it's necessary at all to have custom handling for when retain is called from Python. The custom handling in release and autorelease is needed so that code that calls these methods continues to work, even though we added an automatic release call in __del__ for Python code that never does Objective-C reference management. However, existing code that calls retain already works correctly with just the changes to release/autorelease, because this PR doesn't add any automatic calls to retain that would need to be suppressed if the user code already calls retain manually.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, that's a fair point. Rather than manually tracking the reference count in Python I am also inclined to just drop the release method override.

Comment on lines +688 to +689
self._needs_release = False
result = send_message(self, "autorelease", restype=objc_id, argtypes=[])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the other hand, this is correct and necessary to avoid a second release if user code calls autorelease. Good catch - I forgot about that before.

docs/how-to/memory-management.rst Outdated Show resolved Hide resolved
Co-authored-by: dgelessus <dgelessus@users.noreply.github.com>
docs/how-to/memory-management.rst Outdated Show resolved Hide resolved
docs/how-to/memory-management.rst Outdated Show resolved Hide resolved
rubicon/objc/api.py Outdated Show resolved Hide resolved
rubicon/objc/api.py Outdated Show resolved Hide resolved
Sam Schott and others added 5 commits January 20, 2021 23:24
Co-authored-by: dgelessus <dgelessus@users.noreply.github.com>
Co-authored-by: dgelessus <dgelessus@users.noreply.github.com>
@samschott
Copy link
Member Author

Thank you @dgelessus for very carefully reading the documentation!

Copy link
Collaborator

@dgelessus dgelessus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - and it seems that GitHub Actions has finally sorted itself out too.

Thank you again for the PR! The lack of proper reference handling (and documentation on the topic) has been an open issue for rubicon-objc for quite a while, which I never got around to fixing/documenting properly, so the contribution is much appreciated 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Releasing memory from deleted Objc classes
3 participants