-
-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Autorelease objects which we own #201
Conversation
Oh, and it looks like the test suite is hitting a segfault somewhere, but I am note sure where. The toga examples all seemed to run just fine... Edit: I cannot reproduce the segfault locally, all tests pass on my machine with macOS 11.1 and Python 3.9. This is however quite different from macOS 10.15 and Python 3.5 used for the smoke tests. Any idea of what might be going wrong? Otherwise I can set up a VM with macOS 10.15. |
A further update on the segfault: It always occurs in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for working on this! The implementation looks logical and straightforward to me.
The workaround with the
_needs_release
is a bit of a hack but it seems necessary to me becauseObjCInstance
itself does not know which method created it.
I wouldn't worry about that - I think an extra marker like this is unavoidable here, because as you say there's otherwise no way to tell if an object came from an alloc
/etc. call and still has an extra reference that needs to be freed.
I've opted to use
autorelease
instead ofrelease
because the latter would result in occasional segfaults when the objc-instance is still needed while the Python instance has been destroyed. This is the case for some attributes of the cell views in atoga.Table
.
Hm, I'm not quite sure about this one. autorelease
is definitely the safer option, but in most cases a normal release
should work fine too. autorelease
is (as far as I know) only necessary when writing a factory method that returns a newly allocated object, but without an extra reference like alloc
/etc. For this to work, the returned object needs to be autorelease
d and not release
d, because otherwise it would already be deallocated by the time it's returned from the method.
An advantage of using release
is that it makes reference management problems easier to detect and debug. If you release
an object too early, it will be deallocated right away, and any code that still uses it will fail relatively quickly. However, if you autorelease
too early, the object remains usable at first, and you'll notice only after the end of the current autorelease pool if the object is still incorrectly referenced anywhere.
Rubicon needs to properly support autorelease
of course, but I think it would be better if by default __del__
uses release
, with the option to make it use autorelease
instead for certain objects. Not sure how this would be best implemented API-wise.
it looks like the test suite is hitting a segfault somewhere
A further update on the segfault: It always occurs in
test_core
but the actual method which triggers it is not reproducible. The failure is also specific to Python 3.5 and does not occur on Python 3.7 to 3.9 (all tested on macOS 11.1).
Interesting, especially that it only happens on Python 3.5. I don't have any solid ideas here and haven't had the time to properly test this myself, so I'm mostly just guessing what could go wrong.
For the record, it can't have anything to do with processor architecture differences, because the crash also happens on GH Actions under macOS 10.15, and that macOS version only supports x86_64 (too new for i386 and too old for arm64).
A weird thing I ran across is how __del__
interacts with weak references: when CPython deallocates a Python object with a __del__
method, any weak references pointing to that object are only cleared after the __del__
call has finished. This could cause issues when multiple threads use rubicon-objc at the same time. If ObjCInstance.__del__
is called on one thread (which then calls release
/autorelease
) and another thread at the same time looks up that same object by its pointer (which retrieves the existing instance from ObjCInstance._cached_objects
), the second thread would receive an ObjCInstance
which was already released (or is currently being released or has been marked for autorelease) on the Objective-C side.
Another possibility is that there's a reference management bug somewhere in the rubicon-objc code that we simply didn't notice before, because newly alloc
ed objects never had their original reference released. I don't have any concrete ideas where that might happen though - probably somewhere in the unit test code itself, because I'm not sure if the main rubicon.objc code ever alloc
s anything.
This is one case where using release
instead of autorelease
in __del__
could help with debugging 🙂
I also noticed that apparently these segfaults don't trigger Python's faulthandler
, which we explicitly enable in our unit tests. I think this never happened to me before - faulthandler
is usually quite reliable and helpful with crashes in native code. Maybe it would help to move the faulthandler.enable()
call further up in the init.py, so that it gets enabled before rubicon.objc
is imported?
Also, if you get a problem report popup from macOS when you reproduce the crash locally, can you copy the crash report and post it here (or as an attachment or Gist, if it's long)? Even though the native macOS crash report usually isn't as helpful as the Python faulthandler
stack trace, it's better than having no information about the crash at all.
Co-authored-by: dgelessus <dgelessus@users.noreply.github.com>
Co-authored-by: dgelessus <dgelessus@users.noreply.github.com>
Here is the macOS crash report: rubicon-objc-py35-crash.txt I'll try switching to Regarding the use of
The solution to this would be to always assign objc instances to Python variables and make sure they do not get garbage collected. This could be a bit tedious. |
Alas, neither enabling the faulthander earlier, switching to |
Welp, the stack trace in that crash report is not very helpful - I'm guessing the stack got destroyed before/during the crash.
The problem here is that I searched a bit, and according to this SO question, this problem with
Toga does this already, but only after setting the properties, meaning that the objects are already deallocated by that point. This can probably be fixed as you say by storing the views in local variables, so they are kept retained during the entire method. Though to be safe, I would also add them as subviews as quickly as possible, to ensure that they are also retained that way. |
@dgelessus, thanks for the analyses of the issue with |
I think I have found the problem with the failing tests: the tests which would trigger a segfault are Don't ask me why this only is an issue in Python 3.5, or why some of the methods still passed sometimes. Edit: Maybe it has something to do with the timing of garbage collection and if pytest would still register a test as passed before the crash. |
Hm, it looks like the macOS compatibility test failed to start. |
Ah, that explains it, nice catch! Lucky that this caused noticeable segfaults on Python 3.5 at least so we didn't miss this completely. Not sure why it doesn't crash on later Python versions - probably as you say because of some change to when/how CPython deallocates objects and runs the garbage collector. On that note though - I wonder if it would be good to implement some compatibility handling so that old code like this (that correctly releases extra references) doesn't break because of this change. Maybe
Strange, not sure what's going on there. I see two runs for "CI / macOS compatibility test (pull_request)", one that failed (where GH Actions just shows "This check failed") and one that ran normally and succeeded. I'm guessing it restarted the job automatically, but remembered the failed job and for some reason considers the entire CI failed as a result. I'm guessing it's just a temporary failure - I'll restart the CI jobs. |
Yes, I think that's a good idea. It's also nice because it gives the user the option to explicitly release an object instead of implicitly by deleting all Python references (dangerous as this may be). Another thing which is missing IMO is documentation. I'll try to familiarise myself with the rubicon-objc docs and see where a paragraph about memory management would fit it. |
Looking a the docs, the |
@samschott Nice work - I'll leave it to @dgelessus to give final approval, but this looks like a reasonable and easily explained solution to a non-trivial problem. As for docs: the API docs are the right place to literally document that the |
What @freakboy3742 said 🙂 At the moment we unfortunately don't have any proper docs about how to do memory management for Objective-C objects with Rubicon, so there's no obvious place in the docs where this new behavior could be documented. Agreed that we probably need a new short howto for Objective-C memory management, to explain when you do/don't have to call re. the changes to the tests: I think it would be better to not explicitly do Actually, the new custom |
Agreed on all points. When designing the tests, I was wondering how we should handle |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new how-to and the tests look good - my comments in the docs are just about minor formatting things. On the new Python-side retain
method I've left a longer comment, because it currently doesn't handle some cases correctly and I'm wondering how that should be handled best.
rubicon/objc/api.py
Outdated
self._needs_release = True | ||
result = send_message(self, "retain", restype=objc_id, argtypes=[]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This unconditionally calls retain
on the Objective-C side every time retain
is called from Python, but the boolean _needs_release
attribute can only track one of those retain
calls (or none at all, if the object came from alloc
/etc. and so already has _needs_release
set to True
). This means that if you retain an object multiple times from Python, __del__
will not release all of those references, so you instead need to manually call release
for every retain
call.
This could be solved by replacing _needs_release
with an integer reference counter, so that multiple retain
calls can be remembered and __del__
can call release
the correct number of times.
Though I'm wondering if it's necessary at all to have custom handling for when retain
is called from Python. The custom handling in release
and autorelease
is needed so that code that calls these methods continues to work, even though we added an automatic release
call in __del__
for Python code that never does Objective-C reference management. However, existing code that calls retain
already works correctly with just the changes to release
/autorelease
, because this PR doesn't add any automatic calls to retain
that would need to be suppressed if the user code already calls retain
manually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, that's a fair point. Rather than manually tracking the reference count in Python I am also inclined to just drop the release
method override.
self._needs_release = False | ||
result = send_message(self, "autorelease", restype=objc_id, argtypes=[]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the other hand, this is correct and necessary to avoid a second release
if user code calls autorelease
. Good catch - I forgot about that before.
Co-authored-by: dgelessus <dgelessus@users.noreply.github.com>
This reverts commit 764f89f.
Co-authored-by: dgelessus <dgelessus@users.noreply.github.com>
Co-authored-by: dgelessus <dgelessus@users.noreply.github.com>
507e81d
to
f959e65
Compare
Thank you @dgelessus for very carefully reading the documentation! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - and it seems that GitHub Actions has finally sorted itself out too.
Thank you again for the PR! The lack of proper reference handling (and documentation on the topic) has been an open issue for rubicon-objc for quite a while, which I never got around to fixing/documenting properly, so the contribution is much appreciated 🙂
This PR aims to fix #200 and #48 by autoreleasing any obj-c instances which we own (created with a method which starts with "alloc", "copy", "mutableCopy", or "new").
The implementation goes as follows:
_needs_release = True
.I've opted to use
autorelease
instead ofrelease
because the latter would result in occasional segfaults when the objc-instance is still needed while the Python instance has been destroyed. This is the case for some attributes of the cell views in atoga.Table
.Similarly,
autorelease
is not called directly when the instance is created because this would result in some Python variables pointing to a released objc instance.@dgelessus, is this in line with what you envisioned? The workaround with the
_needs_release
is a bit of a hack but it seems necessary to me becauseObjCInstance
itself does not know which method created it. Any better ideas are welcome!PR Checklist: