Primary state error transitions #97

thebyohazard · 2020-04-14T15:01:04Z

In the node_lifecycle design document, it shows the state machine for lifecycle nodes:

The red X for Error Raised on the active state indicates that the original author intended a transition from active to errorProcessing. This PR and the related ones to follow in rcl and rclcpp implement that functionality, while the related PR in demos shows its use.

I also submit that there should be a similar transition from inactive to errorProcessing.

My use case:
I have a lifecycle node that communicates with a piece of hardware that needs to be configured before use and thus a lifecycle node was an obvious choice. For safety reasons, however, the hardware can be hard-killed and reset, while the machine on which the ros node runs stays on. The node can recognize that the hardware has been power-reset. If I am in the active state, I need to go directly to the unconfigured state. Without the error transition, I need a bunch of special logic in the on_deactivate and on_cleanup callbacks, because if I self-transition in the node with deactivate without extra logic, I will be calling functions on the unconfigured hardware. Anyway, I think the error transition more correctly describes what happens. I also prefer to keep the externally available transitions external to the node.

Similarly, if my hardware is killed while the node is inactive, I will not want to call the cleanup code that could try to set parameters on the unconfigured hardware, I want to go through the error transition instead.

My original question asked to the community on the answers page is here. When I asked it, I did not find that this functionality was also asked for on the answers page by another user and that issue 547 on rclcpp had been filed until more searching.

thebyohazard · 2020-04-14T15:33:56Z

Why is this dco bot whining at me when I have a signed-off-by line?

fujitatomoya · 2020-04-16T02:38:49Z

@thebyohazard

thanks for the contribution,

The red X for Error Raised on the active state indicates that the original author intended a transition from active to errorProcessing.

i think so too, in fact the following is described in lifecycle design,

Transition State: ErrorProcessing
It is possible to enter this state from any state where user code will be executed.

The node can recognize that the hardware has been power-reset.

IMO, i would not do with error processing since device state can be detected.

anyway, i really would like to hear from others.

Signed-off-by: thebyohazard <patrick@jlpengineering.com>

tfoote · 2020-05-01T08:56:57Z

I also submit that there should be a similar transition from inactive to errorProcessing.

Yes, I think that was an oversight in the graphic as there definitely could be user code there that could error as pointed out by @fujitatomoya

Originally there wasn't expected to be anything happening in the inactive state or configured states, but there may be other threads or activity that could cause transitions (such as hardware errors) that would change the status. So to that end adding a transition from Unconfigured to ErrorProcessing would also make sense since something could go wrong in that state too, potentially recoverable with some error recovery.

A proposed PR to the design to fill that in for discussion would be a good idea. Once resolved there we can update this to match the resolution and fill in the implementation here and the others linked.

tfoote · 2020-05-01T08:57:50Z

lifecycle_msgs/msg/Transition.msg

+# be called when an error arises during
+# normal operation that causes the node
+# to need reconfiguration.
+uint8 TRANSITION_ACTIVE_ERROR = 63


These should probably not be sequential but pick a new decade starting with 70

rcl_interfaces/lifecycle_msgs/msg/Transition.msg

Line 43 in 018fe20

# Reserved [10-69], private transitions

Reserved [10-69], private transitions

it needs to expand reserved range.

SteveMacenski · 2020-05-21T16:40:39Z

Is there any movement on this and the accompanying PRs?

thebyohazard · 2020-05-21T17:05:07Z

@SteveMacenski: Yes! I'm in the process of doing the design PR that Tully suggested and I hope to have that done today. Coronavirus has been meddling in my plans lately.

fujitatomoya · 2020-05-22T06:35:41Z

lifecycle_msgs/msg/Transition.msg

+# be called when an error arises during
+# normal operation that causes the node
+# to need reconfiguration.
+uint8 TRANSITION_ACTIVE_ERROR = 63


rcl_interfaces/lifecycle_msgs/msg/Transition.msg

Line 43 in 018fe20

# Reserved [10-69], private transitions

Reserved [10-69], private transitions

it needs to expand reserved range.

Guillaume-Cr · 2020-07-09T02:42:51Z

Similarly, if my hardware is killed while the node is inactive, I will not want to call the cleanup code that could try to set parameters on the unconfigured hardware, I want to go through the error transition instead.

@thebyohazard I think this is very application dependant. You might find other cases where the on_cleanup() is necessary before trying to configure again. Take the example of a hypothetical robot with enable_drives and disable_drives() interfaces called in the respective transitions. Some hardware might need the connection to reset to be able to configure again.

fujitatomoya · 2020-11-04T08:05:01Z

@thebyohazard

friendly ping, are you still working on this? i think this makes sense but it would be better to discuss and have consensus on design 1st.

msmcconnell · 2021-10-04T17:11:16Z

Any progress on this? IMO some sort of fix like this is strongly needed for lifecycle nodes to function as intended. As it stands adding primary state exception handling from an extending class is very roundabout.

fujitatomoya · 2021-10-04T17:27:27Z

I was going to take over, see ros2/design#283. but i do not have time to do it soon.

g-arjones · 2022-08-08T18:14:15Z

@fujitatomoya I really need this feature and I will be happy to help. Could you please summarize what's missing before this can be merged? /cc @tfoote @clalancette @Karsten1987

fujitatomoya · 2022-08-09T15:01:55Z

current status is that some PRs are under Requested Change and 2nd review from maintainer is required, the original author is not available at this moment.

g-arjones · 2022-08-09T17:12:09Z

@fujitatomoya So there's no way the community can help? I mean, it's been 2 years and this is a critical part of the design that is missing implementation...

fujitatomoya · 2022-08-09T17:35:45Z

I think i was going to take over and borrow the code from @thebyohazard to make PR since author is not respondng, but i do not have time to do that right now. it would be always and really nice to have the help from community!

g-arjones · 2022-08-09T17:37:15Z

it would be always and really nice to have the help from community!

Glad to hear it. So, could you please summarize what's missing so I can give it a shot?

thebyohazard mentioned this pull request Apr 14, 2020

Primary state error transitions ros2/rcl#618

Open

thebyohazard force-pushed the lifecycle_primary_state_error_transitions branch from b93648c to 244d955 Compare April 14, 2020 15:06

This was referenced Apr 14, 2020

Primary state error transitions ros2/rclcpp#1064

Closed

Primary state error transitions ros2/demos#436

Open

hidmic added the enhancement New feature or request label Apr 16, 2020

Add primary state error transition constants to message definition

4e96ec8

Signed-off-by: thebyohazard <patrick@jlpengineering.com>

thebyohazard force-pushed the lifecycle_primary_state_error_transitions branch from 244d955 to 4e96ec8 Compare April 16, 2020 14:53

hidmic assigned Karsten1987 Apr 23, 2020

tfoote reviewed May 1, 2020

View reviewed changes

gimait mentioned this pull request May 21, 2020

Added tool to request transition to ErrorState ros-navigation/navigation2#1758

Closed

thebyohazard mentioned this pull request May 21, 2020

Lifecycle primary state error transitions ros2/design#283

Open

fujitatomoya requested changes May 22, 2020

View reviewed changes

jginesclavero mentioned this pull request Sep 25, 2020

Add mros_contingencies_sim MROS-RobMoSys-ITP/Pilot-URJC#48

Merged

audrow changed the base branch from master to rolling June 28, 2022 14:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Primary state error transitions #97

Primary state error transitions #97

thebyohazard commented Apr 14, 2020

thebyohazard commented Apr 14, 2020

fujitatomoya commented Apr 16, 2020

tfoote commented May 1, 2020

tfoote May 1, 2020

fujitatomoya May 22, 2020

SteveMacenski commented May 21, 2020

thebyohazard commented May 21, 2020

fujitatomoya May 22, 2020

Guillaume-Cr commented Jul 9, 2020

fujitatomoya commented Nov 4, 2020

msmcconnell commented Oct 4, 2021

fujitatomoya commented Oct 4, 2021

g-arjones commented Aug 8, 2022

fujitatomoya commented Aug 9, 2022

g-arjones commented Aug 9, 2022

fujitatomoya commented Aug 9, 2022

g-arjones commented Aug 9, 2022

Primary state error transitions #97

Are you sure you want to change the base?

Primary state error transitions #97

Conversation

thebyohazard commented Apr 14, 2020

thebyohazard commented Apr 14, 2020

fujitatomoya commented Apr 16, 2020

tfoote commented May 1, 2020

tfoote May 1, 2020

Choose a reason for hiding this comment

fujitatomoya May 22, 2020

Choose a reason for hiding this comment

SteveMacenski commented May 21, 2020

thebyohazard commented May 21, 2020

fujitatomoya May 22, 2020

Choose a reason for hiding this comment

Guillaume-Cr commented Jul 9, 2020

fujitatomoya commented Nov 4, 2020

msmcconnell commented Oct 4, 2021

fujitatomoya commented Oct 4, 2021

g-arjones commented Aug 8, 2022

fujitatomoya commented Aug 9, 2022

g-arjones commented Aug 9, 2022

fujitatomoya commented Aug 9, 2022

g-arjones commented Aug 9, 2022