Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement smart pointers and signals2 tracking #90

Open
wants to merge 4 commits into
base: noetic-devel
Choose a base branch
from

Conversation

yassiezar
Copy link
Contributor

@yassiezar yassiezar commented Mar 27, 2023

This is an experimental branch to solve some of the issues we've been facing, described in #57 and #86

Basically, I've added the ability to gain access to clients from CBs as std::weak_ptrs rather than raw pointers. Since they were saved as std::shared_ptrs already within the state machine, it was a pretty straight-forward change.

The main change is using boost::signals2 tracking facility to automatically monitor and control a slot's lifetime based on when the owning CB goes out of scope. Early tests show that this might prevents the issue we've been facing where member variables are being access from a CB callback after the callback has been deallocated upon state transitions.

Both these changes have been integrated in a small part within the sm_ferrari example's CB if you wanted to see the intended usage

This is very much WIP and a lot needs to be refined. Opening up this PR for visibility and comments

┆Issue is synchronized with this Jira Task by Unito

@pabloinigoblasco
Copy link
Collaborator

pabloinigoblasco commented Mar 27, 2023

Hello @yassiezar
Your point is well understood. indeed the changes you proposed might be needed to do the use case you are trying:

locked_client->onMessageReceived(&CbMySubscriberBehavior::onMessageReceived, shared_from_this());

But, is that conceptually right? I would say it isn't.

I would like to clarify the use case you mentioned that requires bringing the client behavior pointer shared_from_this(), which may be unsafe, outside of its lifecycle scope. I think that could mean an incorrect design of the use case (maybe I am wrong).
What is the use case that need to do that? why do you need to use a client behavior pointer from a client code?

It seems that the ideal approach (and the way SMACC is implemented) is just the opposite, is to have client behaviors that utilize client references and their public functions to interact with them. This approach ensures that the code is executed during a safe period where we can guarantee the validity of client behaviors. Using client references should not be in general a problem since their lifetime is the whole application.

Client behaviors signal callbacks are disconnected before they are disposed in memory so no error should happen.
If you have variables or information that can be changed outside the state or client behavior life-cycle, information must be located in the client or in the component. Then, hopefully there wouldn't be any reason why you need to pass the client behavior pointer outside its context.

@yassiezar
Copy link
Contributor Author

yassiezar commented Mar 28, 2023

Hey @pabloinigoblasco. I understand your point that it shouldn't be possible for callbacks to trigger while states transition, but my investigations make me believe otherwise (see my last comments on #86 where I post a couple of examples showing member variables seemingly being deallocated while a callback is active).

Using client references should not be in general a problem since their lifetime is the whole application.

This is true in principle. However, you have to remember that the pointer variables pointing to the clients can go out of scope and become invalid, even while the clients themselves are perfectly valid. This is the underlying problem.

Client behaviors signal callbacks are disconnected before they are disposed in memory so no error should happen.

Could you point me to the code where this happens? When I look at this, I see that client behaviours are deallocated and destroyed before the slots are disconnected from the signals. Does this not mean that the callbacks can still be triggered after a transition has been triggered? This seems to be the case here and would explain segfaults from referencing out-of-scope memory

The idea I'm exploring in this PR is to use boost::signals2's tracking capabilities (see here under the 'Automatic Connection Management' section) to syncronise the lifetimes of the callback (i.e. a slot attached for a SmaccSignal) and its owning object (client behaviour) and automatically manage their lifetimes. signals2 does this by attaching a slot to a shared_ptr reference to the owning object, hence the shared_from_this() call within the client behaviour. To reiterate, the client doesn't store any references to the client behaviour (that would be counter to SMACC' design!) - however, the SmaccSignal, i.e. a signals2::signal stores a reference to the client behaviour when it creates a slot, which it then uses to track the lifetime of the CB and deallocate the slot when the referenced client behaviour is destroyed during state transitions. This is an important point and one where I think there's some misunderstanding from your first point.

I think how tracking works is that the shared_ptr's reference count is incremented while the slot (i.e. the ROS callback) is active, which is what prevents the client behaviour from getting destroyed until after the callback has finished running. This is, in my opinion, a more robust approach to managing slot lifetimes the way SMACC is designed to. However, there is perhaps a discussion to be had on the potential implications of keeping a client behaviour in-memory until it has finished running. E.g. what will happen if you run a sleep function in the callback and transition out and back into the state - does this keep the old CB alive and create a new one? What effect does that have on the state machine?

@pabloinigoblasco
Copy link
Collaborator

Hey @pabloinigoblasco. I understand your point that it shouldn't be possible for callbacks to trigger while states transition, but my investigations make me believe otherwise (see my last comments on #86 where I post a couple of examples showing member variables seemingly being deallocated while a callback is active).

I didnt notice the updates of that thread. I have to read it thoroughly, I only had a quick read. My primary goal is to determine if the issue at hand is a bug or if you are utilizing SMACC in a manner that is prohibited by design.

Nonetheless. I read in that bug the following:

"Components and clients are long-lived objects and I can see that their memory is still valid. However, the CB's member variables, including the pointers to long-lived objects, seem to become invalid or change after/during transitions"

That is correct by design. During the transition the state variables and client behavior variables are invalid.
I insist with the same question: who is trying to use those variables? Who is trying to access to those variables? And why?

Using client references should not be in general a problem since their lifetime is the whole application.

This is true in principle. However, you have to remember that the pointer variables pointing to the clients can go out of scope and become invalid, even while the clients themselves are perfectly valid. This is the underlying problem.

When states and client behaviors are being disposed internal variables become invalid. Totally agree with that: "we are in the process of removing the object". What is that callback? is that using SmaccSignals properly?

Client behaviors signal callbacks are disconnected before they are disposed in memory so no error should happen.

Could you point me to the code where this happens? When I look at this, I see that client behaviours are deallocated and destroyed before the slots are disconnected from the signals. Does this not mean that the callbacks can still be triggered after a transition has been triggered? This seems to be the case here and would explain segfaults from referencing out-of-scope memory

Signals must be disconnected before deallocating client behaviors. If that is not like that in the code, there is a bug we have to fix. I will have a look to that now.

@yassiezar
Copy link
Contributor Author

"Components and clients are long-lived objects and I can see that their memory is still valid. However, the CB's member variables, including the pointers to long-lived objects, seem to become invalid or change after/during transitions"

That is correct by design. During the transition the state variables and client behavior variables are invalid.
I insist with the same question: who is trying to use those variables? Who is trying to access to those variables? And why?

That is the heart of the issue at hand: during transitions, client behaviour callbacks are still being run. So to answer your question, the object trying to use the client behaviour internals is the client behaviour itself! When we meet later today, I might be able to explain it more clearly

Signals must be disconnected before deallocating client behaviors. If that is not like that in the code, there is a bug we have to fix. I will have a look to that now.

This is indeed where I think the issue may lie, but I'm not sure. When I tried to change the deallocation and disconnection ordering, I ran into heap overflows...I stopped experimenting after I made my system unresponsive the 3rd time :)

@pabloinigoblasco
Copy link
Collaborator

pabloinigoblasco commented Mar 28, 2023

I have been having a look to the code, it is right there is a race condition for multithreaded application that could cause a callback on a destroyed object. Hence, that is a bug that it is being shown in your application.

Disconnection of callbacks for life-time objects must be (by design) just before they are disposed.

This is the first solution I propose:
#91

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants