Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

avoid lifecycle node transition exception #1319

Merged

Conversation

fujitatomoya
Copy link
Collaborator

addresses #1209

replaces #1317

Copy link
Collaborator Author

@fujitatomoya fujitatomoya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with this fix, we can avoid the crash on action server from exception and the action client can receive the response with failure as below.

root@tomoyafujita:~/ros2_ws/colcon_ws# ros2 run lifecycle_py lifecycle_talker 
[WARN] [1721693858.103483007] [rcl_lifecycle]: No transition matching 2 found for current state unconfigured
[ERROR] [1721693858.114465029] [lc_talker]: Unable to start transition 2 from current state (1, 'unconfigured')

root@tomoyafujita:~/ros2_ws/colcon_ws# ros2 service call /lc_talker/change_state lifecycle_msgs/srv/ChangeState "{transition: {id: 2}}"
requester: making request: lifecycle_msgs.srv.ChangeState_Request(transition=lifecycle_msgs.msg.Transition(id=2, label=''))

response:
lifecycle_msgs.srv.ChangeState_Response(success=False)

@fujitatomoya
Copy link
Collaborator Author

@mjforan @Barry-Xu-2018 @chrisbitter if any of you can do the review or test, that will be really appreciated.

@Barry-Xu-2018
Copy link
Contributor

Barry-Xu-2018 commented Jul 23, 2024

About __change_state(), my understanding is

  • Before calling this function, the state machine should be in a primary state (unconfigured, inactive, active or finalized). After calling this function, it should switch to another primary state.
  • The argument transition_id for __change_state() should be one of configure, cleanup, activate, deactivate, unconfigured_shutdown, inactive_shutdown and active_shutdown

So, when returning from __change_state(), we need to ensure that the state machine is in a primary state, not a transition state (such as configuring, cleaningup, shuttingdown, etc). If trigger_transition_by_id/trigger_transition_by_label throw exception, it should call on_error and then switch to unconfigured or finalized state.

Is my understanding correct?

Besides, __change_state() may be executed concurrently (the program modifies the state while an external process modifies the state through change_state service). This function does not have lock protection

@mjforan
Copy link

mjforan commented Aug 1, 2024

Testing the basic lifecycle_py demo shows the change working properly.
Before:
Screenshot from 2024-08-01 16-16-20
After:
Screenshot from 2024-08-01 16-16-52
I had to use the Rolling branch of rclpy because lifecycle_py doesn't work under Jazzy.

I agree with @Barry-Xu-2018; the exception should not be handled the same way for all three instances.
My opinion:

x exception caused by how to handle
1 invalid transition request write to log and return error to caller
2 unable to transition out of transition state enter ErrorProcessing
3 unable to transition out of ErrorProcessing raise an exception and crash

@mjcarroll mjcarroll requested a review from clalancette August 8, 2024 13:55
PositiveBeat added a commit to energinet-digitalisering/energirobotter-elrik that referenced this pull request Sep 27, 2024
If bad state change is called, a Python lifecycle node will crash.
Until this PR (ros2/rclpy#1319) is merged, this
is a workaround.
@fujitatomoya
Copy link
Collaborator Author

@mjcarroll @sloretz @ahcorde can you take a look when you have time?

Copy link
Contributor

@ahcorde ahcorde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some conflict to resolve

@fujitatomoya
Copy link
Collaborator Author

@ahcorde appreciate your time for the review, thanks! yeah, i will do rebase and resolve the conflict, and get back to you.

Signed-off-by: Tomoya Fujita <Tomoya.Fujita@sony.com>
Signed-off-by: Tomoya Fujita <Tomoya.Fujita@sony.com>
@fujitatomoya fujitatomoya force-pushed the fujitatomoya/lifecycle-node-transition-exception branch from 2b1611e to f76104b Compare December 2, 2024 21:04
@fujitatomoya
Copy link
Collaborator Author

@Barry-Xu-2018 about your comment on #1319 (comment) makes sense. (including locking mechanism) AFAIS, current PR is aligned with rclcpp behavior. how about addressing this comment with another issue? this is gonna be more complicated and takes time to verify.

@fujitatomoya
Copy link
Collaborator Author

Pulls: #1319
Gist: https://gist.githubusercontent.com/fujitatomoya/1d9faa816731c8aee120548004e4bb46/raw/7ea0f998c7e152f14f1d178cf76fa6d108383ba1/ros2.repos
BUILD args: --packages-above-and-dependencies rclpy
TEST args: --packages-above rclpy
ROS Distro: rolling
Job: ci_launcher
ci_launcher ran: https://ci.ros2.org/job/ci_launcher/14891

  • Linux Build Status
  • Linux-aarch64 Build Status
  • Linux-rhel Build Status
  • Windows Build Status

@fujitatomoya
Copy link
Collaborator Author

@Barry-Xu-2018 friendly ping

@Barry-Xu-2018
Copy link
Contributor

LGTM

@fujitatomoya
Copy link
Collaborator Author

fujitatomoya commented Dec 12, 2024

  • Linux-rhel Build Status
  • Windows Build Status

@fujitatomoya
Copy link
Collaborator Author

@fujitatomoya fujitatomoya merged commit 23e9c57 into rolling Dec 13, 2024
2 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants