Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce a RMF transportation workcell #42

Merged
merged 55 commits into from
Jan 25, 2025
Merged

Conversation

luca-della-vedova
Copy link
Member

@luca-della-vedova luca-della-vedova commented Dec 11, 2024

This PR introduces RMF integration into nexus, where RMF is a workcell, managed by the workcell orchestrator, that is capable of executing transportation tasks through a new behavior tree and set of capabilities.

This is in a simple demo stage. I brought in a modified office world, with the only modifications being renaming the dispensers to the names of the workcells, and added a new launch to nexus_integration_tests that launches RMF together with Nexus, as well as changed the movement of items to be based on an AMR rather than a mock transporter.

Test it!

Clone, build and run:

ros2 launch nexus_integration_tests depot.launch.xml headless:=false

Submit a task:

ros2 action send_goal /system_orchestrator/execute_order nexus_orchestrator_msgs/action/ExecuteWorkOrder "{order: {id: '23', work_order: '$(cat config/pick_and_place.json)'}}"

You should see the transportation happening:

Screencast.from.2024-12-12.18-10-14.webm

PR breakdown

The PR is large but I'll try to condense the main decision (and potentially controversial) points I went through during the design.

nexus_integration_tests vs nexus_demos

It would be more natural to create a new nexus_demos package that contains the bringup and I got halfway there before realising it would make the diff explode even further, so I went for an initial approach that reduces the number of changes in nexus_integration_tests, we can then do a followup PR that splits the package into a nexus_demos and a nexus_integration_tests, or maybe just rename it.

Task cancellation

As noted in #40, the cancellation behavior of the workcell can't be customized and defaults to letting tasks run to completion. This means that the RMF task will not be cancelled and if a robot happens to be halfway through a long task and be waiting for a workcell that is cancelled, it will hang its waiting indefinitely. Once #40 is addressed we should add task cancellation to the TransportAmr capability.

Is task doable / navgraph checking

As noted in #41, the payload can't be used for verifying task capability. Transportation tasks have a payload with a list of destinations and they will currently always return true regardless of whether the destinations exist or not. A more advanced capability checking that, for example, checks the fleet's navgraph for existence of waypoints, would be a better design.

Map annotation

Visualizing the workcell requires its position to be populated, however Nexus (and the workcell orchestrator) currently have no way to populate this information.
For now just for the sake of visualizing I wrote a node that subscribes to the /map topic and looks for all waypoints with the pickup_dispenser property and use their location to populate markers. It will then subscribe to states and update them.

A better long term design would involve passing the workcell orchestrator information about the location of the workcell, pass it to the system orchestrator when registering and refactoring the visualization node to regularly calls the /list_workcells service to query for existence of new workcells. I deferred this to avoid adding a large diff to the workcell orchestrator node and keep changes strictly addictive for review simplicity.

Signaling

I introduced the capability of receiving signals for the system orchestrator, as well as change the default behavior tree to wait for the AMR before starting the workcell, rather than halfway. This was done to improve reliability in case of parallel tasks (i.e. there is no risk of a workcell starting a task, just for the wrong AMR to come in) but parallel tasks are still not quite there so not sure if it is still needed. An example of behavior tree that implements this new logic is here.

What's next

Many things! But this PR is already at a very large size and I tried to keep the diff minimal (where I liberally define "diff" as pre-existing files that are changed and risk breaking existing behavior, not new additions that are more likely to be safe).

Create a Gazebo simulation that includes workcells together with AMRs

Right now the workcells are not simulated in Gazebo, it would be great to have a proper simulation world so users can inspect what is happening.
Often these workcells have conveyor belts to feed the items to / from the AMRs, these would also be valuable additions.

Simulate humans for workcells that are manually operated

In real life, not all workcells are automated and some are just operated by humans. We could mock this in simulation by just having a human in the dropoff point and a special behavior tree that just waits for an input.

Task parallelism

Currently submitting parallel tasks can risk deadlocking the system, since RMF and Nexus are somewhat independent. We should revisit the implementation to make sure we can have parallel tasks.

SKU Tracking

It would be interesting to show the position and status of the SKUs in rviz. This is especially useful to know their state as they are being moved throughout the facility.

Better handling of workcell location and registration

As noted in the Map annotation section of the PR description, populate the information at workcell registration time and not by subscribing to a /map topic.

Post processing of waypoints for AMR tasks

Currently, whenever a work order is received, an AMR task that goes through all the workcells will be generated and each workcell will only be signaled to start when the AMR arrives.
This however, will be suboptimal in two corner cases:

  1. If there is only one workcell and we don't want to use an AMR to transport, we will still request an AMR to the location which is unnecessary.
  2. If there are multiple tasks being done by the same workcell, the AMR will have multiple "pickup" phases, although I believe this should be innocuous and just introduce some extra signaling.

It is actually a bit tricky to design a single behavior tree that works for all cases and I would actually suggest using a different behavior tree for different purposes, such as the first case.

Signed-off-by: Luca Della Vedova <lucadv@intrinsic.ai>
Signed-off-by: Luca Della Vedova <lucadv@intrinsic.ai>
Signed-off-by: Luca Della Vedova <lucadv@intrinsic.ai>
Signed-off-by: Luca Della Vedova <lucadv@intrinsic.ai>
Signed-off-by: Luca Della Vedova <lucadv@intrinsic.ai>
Signed-off-by: Luca Della Vedova <lucadv@intrinsic.ai>
Signed-off-by: Luca Della Vedova <lucadv@intrinsic.ai>
Signed-off-by: Luca Della Vedova <lucadv@intrinsic.ai>
Signed-off-by: Luca Della Vedova <lucadv@intrinsic.ai>
Signed-off-by: Luca Della Vedova <lucadv@intrinsic.ai>
Signed-off-by: Luca Della Vedova <lucadv@intrinsic.ai>
Signed-off-by: Luca Della Vedova <lucadv@intrinsic.ai>
Signed-off-by: Luca Della Vedova <lucadv@intrinsic.ai>
Signed-off-by: Luca Della Vedova <lucadv@intrinsic.ai>
Signed-off-by: Luca Della Vedova <lucadv@intrinsic.ai>
Signed-off-by: Luca Della Vedova <lucadv@intrinsic.ai>
Signed-off-by: Luca Della Vedova <lucadv@intrinsic.ai>
@luca-della-vedova luca-della-vedova changed the title WIP: introduce a RMF transportation workcell Introduce a RMF transportation workcell Dec 13, 2024
Signed-off-by: Luca Della Vedova <lucadv@intrinsic.ai>
Signed-off-by: Luca Della Vedova <lucadv@intrinsic.ai>
Signed-off-by: Luca Della Vedova <lucadv@intrinsic.ai>
Signed-off-by: Luca Della Vedova <lucadv@intrinsic.ai>
Signed-off-by: Luca Della Vedova <lucadv@intrinsic.ai>
Base automatically changed from luca/ros2dds_bridge to main December 26, 2024 07:59
Signed-off-by: Luca Della Vedova <lucadv@intrinsic.ai>
Yadunund and others added 3 commits January 13, 2025 21:51
Signed-off-by: Luca Della Vedova <lucadv@intrinsic.ai>
Signed-off-by: Luca Della Vedova <lucadv@intrinsic.ai>
@luca-della-vedova
Copy link
Member Author

In f55bab2 I reverted the signaling at the system orchestrator level.
Now the workcell behavior trees are exactly the same regardless of whether it is a pick and place on a conveyor or on an AMR.
Sadly remapping gets in the way since we need to make sure we run the same workcell behavior tree but a different system orchestrator behavior tree (and not only the main.xml, but also the one that is loaded, pick_and_place.xml).

Furthermore, I explored the idea of removing all the duplicated behavior trees / work orders altogether in f9705a7. The idea is that if we just expose the remap_task_types parameter to the launch files we can use it to make sure the same work order results in a different system orchestrator behavior tree but the same workcell behavior tree. But happy to revert it if this is not desirable

luca-della-vedova and others added 2 commits January 14, 2025 17:37
Signed-off-by: Luca Della Vedova <lucadv@intrinsic.ai>
Signed-off-by: Yadunund <yadunund@gmail.com>
@aaronchongth aaronchongth mentioned this pull request Jan 15, 2025
3 tasks
Signed-off-by: Yadunund <yadunund@gmail.com>
@Yadunund Yadunund force-pushed the luca/rmf_transporter branch from 5c714dd to bc73f23 Compare January 18, 2025 01:49
Signed-off-by: Aaron Chong <aaronchongth@gmail.com>
Signed-off-by: Aaron Chong <aaronchongth@gmail.com>
@aaronchongth
Copy link
Collaborator

aaronchongth commented Jan 20, 2025

Although integration tests were passing, I noticed a weird behavior when running non-headless. workcell_2 seems to start moving just as the AMR starts leaving workcell_1. CI shows the same thing too

I've managed to narrow it down to this flag, where despite the initialization to false, it appears to be true when WaitForAmr::onStart gets triggered.

This can be verified by checking out 5d3daf8, and commenting out this line. The printouts will show that the _amr_ready is already true despite no dispenser requests sent. Watch out for

[nexus_workcell_orchestrator-14] [INFO] [1737357565.631919038] [rmf_nexus_transporter]: CHECKING: workcell [workcell_2], rmf_task_id [compose.dispatch-9ccd185db2], amr_ready: [amr ready]

Other than this line, everything else seem good

edit: my guess is that the WaitForAmr BT node is being re-used within LoopDestination, and when the AMR is at workcell_1, while the flag is set back to false the callback still triggered, setting it back to true (RMF continuously sends out dispenser requests every second until a result is provided)

* Working with the same commands

Signed-off-by: Aaron Chong <aaronchongth@gmail.com>

* Basic demo works with models

Signed-off-by: Aaron Chong <aaronchongth@gmail.com>

* Moving enclosures, removing in-between, using nested models, moving camera

Signed-off-by: Aaron Chong <aaronchongth@gmail.com>

* Use new released rmf_building_map_tools args, clean up, moved waypoints, added dispenser/ingestor

Signed-off-by: Aaron Chong <aaronchongth@gmail.com>

* Clean up duplicated and unused files, use rmf_transporter

Signed-off-by: Aaron Chong <aaronchongth@gmail.com>

* Remove duplicated depot

Signed-off-by: Aaron Chong <aaronchongth@gmail.com>

---------

Signed-off-by: Aaron Chong <aaronchongth@gmail.com>
@aaronchongth
Copy link
Collaborator

Per discussion, there may exist an issue regarding multiple work orders at the same time. After investigating more and trying it out, I found that they were working as expected 🤔, with some observations,

cd nexus_integration_tests

# First order accepted
ros2 action send_goal /system_orchestrator/execute_order nexus_orchestrator_msgs/action/ExecuteWorkOrder "{order: {id: '23', work_order: '$(cat config/pick_and_place.json)'}}"

# While the first order is still being executed, run a second same order with a different order ID
ros2 action send_goal /system_orchestrator/execute_order nexus_orchestrator_msgs/action/ExecuteWorkOrder "{order: {id: '24', work_order: '$(cat config/pick_and_place.json)'}}"
# This gets rejected/aborted, saying failed to assign task to workcells as the task ID [2] already exists (this refers to the step ID)

# Save this new order somewhere, https://gist.github.com/aaronchongth/b5b92f140d539c33e0d0ec23b414d70c
# This new order just modifies the step IDs to 3.0 and 4.0
# While the first order is still being executed, send this new order
ros2 action send_goal /system_orchestrator/execute_order nexus_orchestrator_msgs/action/ExecuteWorkOrder "{order: {id: '24', work_order: '$(cat config/new_pick_and_place.json)'}}"
# This order gets accepted and starts being executed after order ID 23 is done

# When the order ID 23 has been completed, send in the same original order, with a different order ID,
ros2 action send_goal /system_orchestrator/execute_order nexus_orchestrator_msgs/action/ExecuteWorkOrder "{order: {id: '25', work_order: '$(cat config/pick_and_place.json)'}}"
# Order gets accepted, and only starts after order ID 24 is done

I haven't been able to replicate the behavior we discussed about, regarding work orders interfering with each other during completion. However summarizing some observations,

  • when a work order is being executed, sending in more work orders with the same step ID will be rejected (the above scenario)
  • when a work order is being executed, sending in work orders with different step IDs will be accepted
  • after the a work order has been completed, sending in the same work order with a different order ID containing same step IDs will be accepted
  • step ID is parsed as double, but retrieved as int, this causes step IDs of 1.0 and 1.1 to be treated the same and gets rejected. Is there a reason we use doubles but parse as int?

Signed-off-by: Aaron Chong <aaronchongth@gmail.com>
@Yadunund
Copy link
Member

Per discussion, there may exist an issue regarding multiple work orders at the same time. After investigating more and trying it out, I found that they were working as expected 🤔, with some observations,

cd nexus_integration_tests

# First order accepted
ros2 action send_goal /system_orchestrator/execute_order nexus_orchestrator_msgs/action/ExecuteWorkOrder "{order: {id: '23', work_order: '$(cat config/pick_and_place.json)'}}"

# While the first order is still being executed, run a second same order with a different order ID
ros2 action send_goal /system_orchestrator/execute_order nexus_orchestrator_msgs/action/ExecuteWorkOrder "{order: {id: '24', work_order: '$(cat config/pick_and_place.json)'}}"
# This gets rejected/aborted, saying failed to assign task to workcells as the task ID [2] already exists (this refers to the step ID)

# Save this new order somewhere, https://gist.github.com/aaronchongth/b5b92f140d539c33e0d0ec23b414d70c
# This new order just modifies the step IDs to 3.0 and 4.0
# While the first order is still being executed, send this new order
ros2 action send_goal /system_orchestrator/execute_order nexus_orchestrator_msgs/action/ExecuteWorkOrder "{order: {id: '24', work_order: '$(cat config/new_pick_and_place.json)'}}"
# This order gets accepted and starts being executed after order ID 23 is done

# When the order ID 23 has been completed, send in the same original order, with a different order ID,
ros2 action send_goal /system_orchestrator/execute_order nexus_orchestrator_msgs/action/ExecuteWorkOrder "{order: {id: '25', work_order: '$(cat config/pick_and_place.json)'}}"
# Order gets accepted, and only starts after order ID 24 is done

I haven't been able to replicate the behavior we discussed about, regarding work orders interfering with each other during completion. However summarizing some observations,

  • when a work order is being executed, sending in more work orders with the same step ID will be rejected (the above scenario)
  • when a work order is being executed, sending in work orders with different step IDs will be accepted
  • after the a work order has been completed, sending in the same work order with a different order ID containing same step IDs will be accepted
  • step ID is parsed as double, but retrieved as int, this causes step IDs of 1.0 and 1.1 to be treated the same and gets rejected. Is there a reason we use doubles but parse as int?

Thanks for investigating further.

  • I do think the current behavior where Step IDs also need to be unique is not ideal. Let's open a ticket and update behavior? The Work Order ID should be unique. The id we pass to the workcell in the WorkcellTask request can be a unique combination of the Work Order ID and the Step ID.
  • We should parse the step ID as an integer and not float (and update work order defns)
  • Regarding the parallel behavior:
    • Even with unique Step IDs, it seems one Work Order needs to complete execution before another can begin. This is not the case with the conveyor plugin. If a workcell needed for the second order is available, it should be able to process the step for the second order. I just tried submitting two separate jobs (unique Step IDs) with use_rmf_transporter:=False and the system executes both in parallel, ie While Step 2 in Work Order 1 is being processed by workcell_2, workcell_1 is processing Step 1 in Work Order 2. I believe this is not possible with the RMF integration given the BT definition and might be related to the point about signaling that Luca made above.
    • I would expect a second RMF bid to go out which gets assigned to the second AMR but the AMR is dispatched only when the required workcell is available.
    • Could you check if there is a quick fix by modifying the BTs? Else let's open a ticket and tackle this in a follow up PR.

Signed-off-by: Yadunund <yadunund@gmail.com>
Signed-off-by: Yadunund <yadunund@gmail.com>
Signed-off-by: Yadunund <yadunund@gmail.com>
Signed-off-by: Yadunund <yadunund@gmail.com>
@Yadunund Yadunund force-pushed the luca/rmf_transporter branch from f58c48d to 43978e0 Compare January 23, 2025 04:00
Signed-off-by: Aaron Chong <aaronchongth@gmail.com>
@aaronchongth
Copy link
Collaborator

aaronchongth commented Jan 23, 2025

Gotcha, I was actually looking into the issue that you showed me, when there were 2 work orders in parallel, completion of the first work order, somehow completed the second work order as well. But at least that weird scenario does not seem to be happening.

I would expect a second RMF bid to go out which gets assigned to the second AMR but the AMR is dispatched only when the required workcell is available.

Thanks for flagging out this other parallel work order scenario. Yeah that is happening due to how the RMF transportation workcell's BT is currently designed, where the root BT takes care of dispatching an RMF task as well as keeping track of each looped destination in the RMF task (to handle the dispenser requests). IIUC, this means the RMF transportation workcell is never "done" until the whole work order is completed, before dispatching another robot.

From the Signalling section in this PR's description, it looks like this is by design to prevent the wrong AMR from reaching the workcell that is waiting for another AMR. Unfortunately I can't think of a way to resolve this on a BT level confidently. I have some ideas, will open a ticket for this particular situation to discuss more.

edit: opened #63 and #64

Signed-off-by: Yadunund <yadunund@gmail.com>
Copy link
Member

@Yadunund Yadunund left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for trailblazing the approach of integrating RMF as a workcell that provides transportation services!

The experience here has been invaluable in better understanding the pros and cons of this approach vs integration via nexus_transporter. The main issue uncovered is that we can't run multiple work orders in parallel since workcells don't have the ability to run tasks in parallel yet. Further, the implementation here implicitly defines the task.type for workcells that perform transportation services, ie, task.type = transportation with an internal schema for the task params to include destinations/pickups etc. Lastly, all transportation steps required for a job is performed by the same transporter workcells. However in practice, we might distribute this among different transporters (eg. conveyor for some segments, AMRs for others (or even different AMRs)).
I've opened a meta-ticket to track things we need to implement to better support Workcells as transporters in general. #67

For now we can merge this PR in and iterate in subsequent PRs.

@Yadunund Yadunund merged commit 7479795 into main Jan 25, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants