Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lifecycle primary state error transitions #283

Open
wants to merge 4 commits into
base: gh-pages
Choose a base branch
from

Conversation

thebyohazard
Copy link

The purpose of this PR is clarify where error transitions from primary states should be in the node lifecycle design. The original conversation is over in rcl_interfaces PR#97. In that conversation, I suggested having an additional transition from Inactive to ErrorProcessing, having in mind a node that connects to external hardware. If there is an error in the external hardware when the node is inactive, then there is error recovery code that needs to be executed, and a transition to ErrorProcessing is the natural choice for that. I have edited the diagrams and the article to show and describe such a transition.

Also in the other PR, @tfoote suggested we discuss here whether or not there should also be a transition from Unconfigured to ErrorProcessing:

Originally there wasn't expected to be anything happening in the inactive state or configured states, but there may be other threads or activity that could cause transitions (such as hardware errors) that would change the status. So to that end adding a transition from Unconfigured to ErrorProcessing would also make sense since something could go wrong in that state too, potentially recoverable with some error recovery.

The reason I didn't include this at first in the other PR was that I have been designing my lifecycle nodes to where nothing happens in the node until configuration and no connections to the hardware are made until configuration, so it was not possible to have an error occur in the unconfigured state. Also, the design says regarding the Unconfigured state,

In this state there is expected to be no stored state

So my contrived example of a way such a transition could maybe be undesirable: if the onError transition expects some object to be filled out, it may not be filled out if the node isn't configured yet. But to that I'd say that an onError transition should probably check that anyway. Besides, it's the code in the lifecycle node that would raise the error in the first place, so if you don't need to raise an error, don't raise it.

On the argument for the transition, the design currently says regarding the ErrorProcessing state:

It is possible to enter this state from any state where user code will be executed.

And I think there definitely could be some kind of user code being executed in the Unconfigured state if a node has some combination of lifecycle components and also non-lifecycle components. The non-lifecycle components could always be running even when the node is unconfigured and then cause the errors.

So I guess my vote is to add the Unconfigured to ErrorProcessing error transition, but I honestly haven't delved too deeply into lifecycle to know what else it might break.

…ror processing

Signed-off-by: thebyohazard <patrick@jlpengineering.com>
Signed-off-by: thebyohazard <patrick@jlpengineering.com>
@thebyohazard
Copy link
Author

One other thing I noticed: the original diagram doesn't show transitions for onDeactivate[FAILURE], onCleanup[FAILURE], onShutdown[FAILURE], or onError[Error Raised], which are in the implementation. Shall I add them to this PR, or should that be a separate branch/PR for documentation purposes?

@fujitatomoya
Copy link
Collaborator

@thebyohazard

Shall I add them to this PR, or should that be a separate branch/PR for documentation purposes?

I believe those should be added along with implementation.

… transition

and failure transitions currently in the implementation
* update article to explicitly mention the added transitions
Signed-off-by: thebyohazard <patrick@jlpengineering.com>
articles/node_lifecycle.md Outdated Show resolved Hide resolved
articles/node_lifecycle.md Show resolved Hide resolved
Signed-off-by: thebyohazard <patrick@jlpengineering.com>
@sloretz
Copy link
Contributor

sloretz commented Jun 25, 2020

@Karsten1987 @wjwwood friendly ping :)

@thebyohazard
Copy link
Author

@fujitatomoya I agree with you that a shutdown failure should go back to the previous state to retry. You could be relying on the shutdown code to execute and that can't be guaranteed right now. However, this and the related PRs are enhancements, but making that change to the implementation could potentially be a breaking change for existing systems. If we were to fix it, I definitely think a separate PR in rcl_lifecycle should be opened other than the one that adds the error transitions. I also don't like the idea of the design not matching the implementation, so I would vote to leave the design matching the current implementation and making another set of PRs to work on fixing shutdown failure to have the expected behavior.

@ivanpauno
Copy link
Member

@Karsten1987 @wjwwood friendly ping

@fujitatomoya
Copy link
Collaborator

related PRs based on this design,

I think that some follow-up will be required (rebase, requested changes, and test code is missing), if @thebyohazard is not responding and nobody working on this, I'm willing to do this.

@UomoJallo
Copy link

@fujitatomoya @Karsten1987 @wjwwood any update about this PR? Or what do you suggest in case of a detected error while in ACTIVE state?

Should the node try first to deactivate itself and then cleanup to reach the UNCONFIGURED state? Even if it should be better to avoid self transitions...

Copy link
Contributor

@tfoote tfoote left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding these transitions make sense to me.

@fujitatomoya
Copy link
Collaborator

could be related to ros2/rclcpp#1880 (just leaving the reference)

@ros-discourse
Copy link

This pull request has been mentioned on ROS Discourse. There might be relevant details there:

https://discourse.ros.org/t/deferrable-canceleable-lifecycle-transitions/32318/1

@cboostjvisser
Copy link

This feature seems almost finished. What can we do to get this merged?
There are quite some PR's referencing from/to this PR. Anyone know what the order of merging should be?
@thebyohazard @fujitatomoya @tfoote @Karsten1987 @gbiggs @wjwwood

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants