Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix segfault #192

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

grungy-ado
Copy link

Hi ,
I encounter similar segfault behavior as mentioned in #84 and managed to debug and fix it, thanks to provided reproducing script.

This issue seems to be related with lua garbage collection of objects that was created and referenced (luaL_ref) in coroutine lua_State. Currently all lua objects are unreferenced from lua_state by python (gc) __dealloc__ method but sometimes unreferencing object created in coroutine ends with segfault because coroutine lua_state is broken. I'm not entirely sure why it is broken but it seems that coroutines associated with such state have already been finished. It is likely that lua gc have already collected such lua_state instance.

I manage to fix this by tracking references of created objects in coroutine lua_state and unreferencing them when associated coroutine (lua_state) ends.
Implementation note: Because of lua_state is also present in LuaRuntime object and I want make this fix as simple as possible, I decide to track all referenced object of LuaRuntime not just those that was referenced by coroutines.

fix #84

Can somebody please review this fix? Thanks

Unreferencing all objects created in coroutine is not good solution because such objects can be used outside of coroutine scope.
This fix refactor all reference tracking to tracking objects and when coroutine is done, set state of all objects that was created in coroutine to parent lua state. This way
objects created in coroutine can be used outside of coroutine scope and it does not cause segmentation fault.
@grungy-ado
Copy link
Author

I discovered that my intial fix contains bug that cause that object created in coroutine cannot be used outside of this coroutine scope because it unreference all objects created in coroutine at the time the coroutine finished.

Fix 422af19 introduce new solution, that do not track references but LuaObjects itself created in coroutine. When coroutine finished all objects created in this coroutine change their lua state to parent lua state of the coroutine. This way objects references in coroutine can be used outside of the coroutine scope and their deallocation does not cause segmentation fault.
This fix also does not track objects created in LuaRuntime state resulting in lower memory usage that previous solution.

@scoder
Copy link
Owner

scoder commented Sep 5, 2021

Thanks for investigating this. I'm not sure if the fix needs to be this complex and would be happy to see something simpler.

If I understand correctly, then the issue here is that _LuaObject instances get cleaned up that were used in the Lua coroutine, and that refer to the lua state of the coroutine, thus ending up with a dead dangling pointer after the coroutine itself gets cleaned up.

Questions:

  • can we detect if a lua coroutine has finished?
  • is there some kind of trigger that Lua could call when a coroutine finishes?
  • is the coroutine cleanup triggered from Python or from Lua? (If from Python, we could play with the reference counts to prevent it from being cleaned up before the Python objects that reference it.)
  • should the _LuaObject really refer to the lua state of the coroutine (from the start), or to the one of the runtime?
  • or maybe we need two lua state references for lua objects (if they are used in coroutines)?

This fix use runtime state for unreferencing LuaObject because using coroutine state of already closed coroutine cause segmentation fault.
@grungy-ado
Copy link
Author

I'm not sure if the fix needs to be this complex and would be happy to see something simpler.

I totally agree. I managed to simplify it in 17dd7ea by using LuaRuntime state for object unreferencing. I seems that LuaRuntime state and coroutine state share the same registry for object references and when using LuaRuntime state for unreferencing we can be sure that it is not dangling pointer of already finished coroutine state. Let me know what you think about this solution.

I notice that there are multiple occurrences where similar object unreferencing occurs too (LuaIter object), but I was not able to came with code sample that would cause segmentation fault. I'm wondering if I should applied this change also in there or not. Do you have any idea about that?

@scoder
Copy link
Owner

scoder commented Oct 24, 2021

Not sure. In fact, I'm not even sure I understand why there should be a difference between self._state and self._runtime._state. I don't know exactly how the coroutine state works. Should self._state refer to the runtime state, maybe?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SEGFAULT calling lua
2 participants