Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Primary state error transitions #97

Open
wants to merge 1 commit into
base: rolling
Choose a base branch
from

Conversation

thebyohazard
Copy link

In the node_lifecycle design document, it shows the state machine for lifecycle nodes:
life_cycle_sm
The red X for Error Raised on the active state indicates that the original author intended a transition from active to errorProcessing. This PR and the related ones to follow in rcl and rclcpp implement that functionality, while the related PR in demos shows its use.

I also submit that there should be a similar transition from inactive to errorProcessing.

My use case:
I have a lifecycle node that communicates with a piece of hardware that needs to be configured before use and thus a lifecycle node was an obvious choice. For safety reasons, however, the hardware can be hard-killed and reset, while the machine on which the ros node runs stays on. The node can recognize that the hardware has been power-reset. If I am in the active state, I need to go directly to the unconfigured state. Without the error transition, I need a bunch of special logic in the on_deactivate and on_cleanup callbacks, because if I self-transition in the node with deactivate without extra logic, I will be calling functions on the unconfigured hardware. Anyway, I think the error transition more correctly describes what happens. I also prefer to keep the externally available transitions external to the node.

Similarly, if my hardware is killed while the node is inactive, I will not want to call the cleanup code that could try to set parameters on the unconfigured hardware, I want to go through the error transition instead.

My original question asked to the community on the answers page is here. When I asked it, I did not find that this functionality was also asked for on the answers page by another user and that issue 547 on rclcpp had been filed until more searching.

@thebyohazard
Copy link
Author

Why is this dco bot whining at me when I have a signed-off-by line?

@fujitatomoya
Copy link
Collaborator

@thebyohazard

thanks for the contribution,

The red X for Error Raised on the active state indicates that the original author intended a transition from active to errorProcessing.

i think so too, in fact the following is described in lifecycle design,

Transition State: ErrorProcessing
It is possible to enter this state from any state where user code will be executed.

The node can recognize that the hardware has been power-reset.

IMO, i would not do with error processing since device state can be detected.

anyway, i really would like to hear from others.

@hidmic hidmic added the enhancement New feature or request label Apr 16, 2020
Signed-off-by: thebyohazard <patrick@jlpengineering.com>
@thebyohazard thebyohazard force-pushed the lifecycle_primary_state_error_transitions branch from 244d955 to 4e96ec8 Compare April 16, 2020 14:53
@tfoote
Copy link
Contributor

tfoote commented May 1, 2020

I also submit that there should be a similar transition from inactive to errorProcessing.

Yes, I think that was an oversight in the graphic as there definitely could be user code there that could error as pointed out by @fujitatomoya

Originally there wasn't expected to be anything happening in the inactive state or configured states, but there may be other threads or activity that could cause transitions (such as hardware errors) that would change the status. So to that end adding a transition from Unconfigured to ErrorProcessing would also make sense since something could go wrong in that state too, potentially recoverable with some error recovery.

A proposed PR to the design to fill that in for discussion would be a good idea. Once resolved there we can update this to match the resolution and fill in the implementation here and the others linked.

# be called when an error arises during
# normal operation that causes the node
# to need reconfiguration.
uint8 TRANSITION_ACTIVE_ERROR = 63
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should probably not be sequential but pick a new decade starting with 70

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

# Reserved [10-69], private transitions

Reserved [10-69], private transitions

it needs to expand reserved range.

@SteveMacenski
Copy link

Is there any movement on this and the accompanying PRs?

@thebyohazard
Copy link
Author

@SteveMacenski: Yes! I'm in the process of doing the design PR that Tully suggested and I hope to have that done today. Coronavirus has been meddling in my plans lately.

# be called when an error arises during
# normal operation that causes the node
# to need reconfiguration.
uint8 TRANSITION_ACTIVE_ERROR = 63
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

# Reserved [10-69], private transitions

Reserved [10-69], private transitions

it needs to expand reserved range.

@Guillaume-Cr
Copy link

Similarly, if my hardware is killed while the node is inactive, I will not want to call the cleanup code that could try to set parameters on the unconfigured hardware, I want to go through the error transition instead.

@thebyohazard I think this is very application dependant. You might find other cases where the on_cleanup() is necessary before trying to configure again. Take the example of a hypothetical robot with enable_drives and disable_drives() interfaces called in the respective transitions. Some hardware might need the connection to reset to be able to configure again.

@fujitatomoya
Copy link
Collaborator

@thebyohazard

friendly ping, are you still working on this? i think this makes sense but it would be better to discuss and have consensus on design 1st.

@msmcconnell
Copy link

Any progress on this? IMO some sort of fix like this is strongly needed for lifecycle nodes to function as intended. As it stands adding primary state exception handling from an extending class is very roundabout.

@fujitatomoya
Copy link
Collaborator

I was going to take over, see ros2/design#283. but i do not have time to do it soon.

@audrow audrow changed the base branch from master to rolling June 28, 2022 14:20
@g-arjones
Copy link

@fujitatomoya I really need this feature and I will be happy to help. Could you please summarize what's missing before this can be merged? /cc @tfoote @clalancette @Karsten1987

@fujitatomoya
Copy link
Collaborator

current status is that some PRs are under Requested Change and 2nd review from maintainer is required, the original author is not available at this moment.

@g-arjones
Copy link

@fujitatomoya So there's no way the community can help? I mean, it's been 2 years and this is a critical part of the design that is missing implementation...

@fujitatomoya
Copy link
Collaborator

I think i was going to take over and borrow the code from @thebyohazard to make PR since author is not respondng, but i do not have time to do that right now. it would be always and really nice to have the help from community!

@g-arjones
Copy link

it would be always and really nice to have the help from community!

Glad to hear it. So, could you please summarize what's missing so I can give it a shot?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants