-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Primary state error transitions #97
base: rolling
Are you sure you want to change the base?
Primary state error transitions #97
Conversation
b93648c
to
244d955
Compare
Why is this dco bot whining at me when I have a signed-off-by line? |
thanks for the contribution,
i think so too, in fact the following is described in lifecycle design,
IMO, i would not do with error processing since device state can be detected. anyway, i really would like to hear from others. |
Signed-off-by: thebyohazard <patrick@jlpengineering.com>
244d955
to
4e96ec8
Compare
Yes, I think that was an oversight in the graphic as there definitely could be user code there that could error as pointed out by @fujitatomoya Originally there wasn't expected to be anything happening in the inactive state or configured states, but there may be other threads or activity that could cause transitions (such as hardware errors) that would change the status. So to that end adding a transition from Unconfigured to ErrorProcessing would also make sense since something could go wrong in that state too, potentially recoverable with some error recovery. A proposed PR to the design to fill that in for discussion would be a good idea. Once resolved there we can update this to match the resolution and fill in the implementation here and the others linked. |
# be called when an error arises during | ||
# normal operation that causes the node | ||
# to need reconfiguration. | ||
uint8 TRANSITION_ACTIVE_ERROR = 63 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These should probably not be sequential but pick a new decade starting with 70
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Reserved [10-69], private transitions |
Reserved [10-69], private transitions
it needs to expand reserved range.
Is there any movement on this and the accompanying PRs? |
@SteveMacenski: Yes! I'm in the process of doing the design PR that Tully suggested and I hope to have that done today. Coronavirus has been meddling in my plans lately. |
# be called when an error arises during | ||
# normal operation that causes the node | ||
# to need reconfiguration. | ||
uint8 TRANSITION_ACTIVE_ERROR = 63 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Reserved [10-69], private transitions |
Reserved [10-69], private transitions
it needs to expand reserved range.
@thebyohazard I think this is very application dependant. You might find other cases where the on_cleanup() is necessary before trying to configure again. Take the example of a hypothetical robot with enable_drives and disable_drives() interfaces called in the respective transitions. Some hardware might need the connection to reset to be able to configure again. |
friendly ping, are you still working on this? i think this makes sense but it would be better to discuss and have consensus on design 1st. |
Any progress on this? IMO some sort of fix like this is strongly needed for lifecycle nodes to function as intended. As it stands adding primary state exception handling from an extending class is very roundabout. |
I was going to take over, see ros2/design#283. but i do not have time to do it soon. |
@fujitatomoya I really need this feature and I will be happy to help. Could you please summarize what's missing before this can be merged? /cc @tfoote @clalancette @Karsten1987 |
current status is that some PRs are under |
@fujitatomoya So there's no way the community can help? I mean, it's been 2 years and this is a critical part of the design that is missing implementation... |
I think i was going to take over and borrow the code from @thebyohazard to make PR since author is not respondng, but i do not have time to do that right now. it would be always and really nice to have the help from community! |
Glad to hear it. So, could you please summarize what's missing so I can give it a shot? |
In the node_lifecycle design document, it shows the state machine for lifecycle nodes:
The red X for Error Raised on the active state indicates that the original author intended a transition from active to errorProcessing. This PR and the related ones to follow in rcl and rclcpp implement that functionality, while the related PR in demos shows its use.
I also submit that there should be a similar transition from inactive to errorProcessing.
My use case:
I have a lifecycle node that communicates with a piece of hardware that needs to be configured before use and thus a lifecycle node was an obvious choice. For safety reasons, however, the hardware can be hard-killed and reset, while the machine on which the ros node runs stays on. The node can recognize that the hardware has been power-reset. If I am in the active state, I need to go directly to the unconfigured state. Without the error transition, I need a bunch of special logic in the
on_deactivate
andon_cleanup
callbacks, because if I self-transition in the node withdeactivate
without extra logic, I will be calling functions on the unconfigured hardware. Anyway, I think the error transition more correctly describes what happens. I also prefer to keep the externally available transitions external to the node.Similarly, if my hardware is killed while the node is inactive, I will not want to call the cleanup code that could try to set parameters on the unconfigured hardware, I want to go through the error transition instead.
My original question asked to the community on the answers page is here. When I asked it, I did not find that this functionality was also asked for on the answers page by another user and that issue 547 on rclcpp had been filed until more searching.