Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Support recording and replay service #1414

Closed

Conversation

Barry-Xu-2018
Copy link
Contributor

@Barry-Xu-2018 Barry-Xu-2018 commented Jul 11, 2023

@Barry-Xu-2018 Barry-Xu-2018 requested a review from a team as a code owner July 11, 2023 06:47
@Barry-Xu-2018 Barry-Xu-2018 requested review from gbiggs and emersonknapp and removed request for a team July 11, 2023 06:47
@Barry-Xu-2018 Barry-Xu-2018 marked this pull request as draft July 11, 2023 06:48
@Barry-Xu-2018 Barry-Xu-2018 force-pushed the review/topic-support-service branch from 1cb3812 to bd6e02b Compare July 11, 2023 08:07
@Barry-Xu-2018
Copy link
Contributor Author

Barry-Xu-2018 commented Jul 18, 2023

Current status @MichaelOrlov

@Barry-Xu-2018
Copy link
Contributor Author

@MichaelOrlov @emersonknapp

I have a question on filter for mcap implementation.
Current codes as the below

if (!storage_filter_.topics.empty()) {
options.topicFilter = [this](std::string_view topic) {
for (const auto & match_topic : storage_filter_.topics) {
if (match_topic == topic) {
return true;
}
}
return false;
};
}
#ifdef ROSBAG2_STORAGE_MCAP_HAS_STORAGE_FILTER_TOPIC_REGEX
if (!storage_filter_.topics_regex.empty()) {
options.topicFilter = [this](std::string_view topic) {
std::smatch m;
std::string topic_string(topic);
std::regex re(storage_filter_.topics_regex);
return std::regex_match(topic_string, m, re);
};
}
#endif

If --regex REGEX is set, it will override --topics TOPICS.

But sqlite implementation has the different behavior. --regex REGEX and --topics TOPICS can take effect at the same time.

// add topic filter
if (!storage_filter_.topics.empty()) {
// Construct string for selected topics
std::string topic_list{""};
for (auto & topic : storage_filter_.topics) {
topic_list += "'" + topic + "'";
if (&topic != &storage_filter_.topics.back()) {
topic_list += ",";
}
}
where_conditions.push_back("(topics.name IN (" + topic_list + "))");
}
// add topic filter based on regular expression
if (!storage_filter_.topics_regex.empty()) {
// Construct string for selected topics
where_conditions.push_back("(topics.name REGEXP '" + storage_filter_.topics_regex + "')");
}
// exclude topics based on regular expressions
if (!storage_filter_.topics_regex_to_exclude.empty()) {
// Construct string for selected topics
where_conditions.push_back(
"(topics.name NOT IN "
"(SELECT topics.name FROM topics WHERE topics.name REGEXP '" +
storage_filter_.topics_regex_to_exclude + "'))");
}

For this PR, new filter parameters for play are added. I have to modify filter related codes.
So the question is whether the filter for mcap should be the same behavior as sqlite ? Or due to certain reasons, the mcap can only be implemented in current way.

@MichaelOrlov
Copy link
Contributor

@Barry-Xu-2018 There are a bit of discussion about topic filters in this PR #944. It is a bit stale but some info for the context and probably other questions in this area. I need to reaiterate on it and move forward with it.

As reagrds to the MCAP behavior.
There are no special restriction for MCAP in this case and I think the behaviour should be the same as in the SQLite3.
i.e. --regex REGEX and --topics TOPICS can take effect at the same time.
Feel free to align it in your PR and I will try to "condition" topic filters in the #944.

cc: @emersonknapp

@Barry-Xu-2018
Copy link
Contributor Author

As reagrds to the MCAP behavior.
There are no special restriction for MCAP in this case and I think the behaviour should be the same as in the SQLite3.
i.e. --regex REGEX and --topics TOPICS can take effect at the same time.
Feel free to align it in your PR and I will try to "condition" topic filters in the #944.

Okay. Thank you.

@Barry-Xu-2018 Barry-Xu-2018 force-pushed the review/topic-support-service branch 2 times, most recently from f709616 to ba418e9 Compare August 2, 2023 04:57
@Barry-Xu-2018
Copy link
Contributor Author

Barry-Xu-2018 commented Aug 2, 2023

@MichaelOrlov @fujitatomoya

I found a problem based on current contents of service event info.

Service and client all enable introspection

---
info:
  event_type: 0      <== request sent event from client
  stamp:
    sec: 1690882511
    nanosec: 241860458
  client_gid:
  - 1
  - 15
  - 235
  - 125
  - 113
  - 20
  - 108
  - 182
  - 0
  - 0
  - 0
  - 0
  - 0
  - 0
  - 21
  - 3
  sequence_number: 89
request:
- a: 89
  b: 3
response: []
---
---
info:
  event_type: 1      <== request received event from service
  stamp:
    sec: 1690882511
    nanosec: 242176284
  client_gid:
  - 1
  - 15
  - 235
  - 125
  - 113
  - 20
  - 108
  - 182
  - 0
  - 0
  - 0
  - 0
  - 0                                                                                                                                                                                                         
  - 0
  - 20
  - 4
  sequence_number: 89
request:
- a: 89
  b: 3
response: []
---

These 2 client gids are different. So we have no way to identify these 2 events for one request.
While there are many events are received at the same (some clients enable introspection and others disable introspection, and service enable introspection), event messages for different clients are mixed. It is impossible to distinguish request.

I have 2 solutions.

  1. Ask user must enable introspection on client (rosbag2 only care event type is REQUEST_SENT).
  2. Change event info struct. Add service_client_gid.
    For REQUEST_RECEIVED event, client_gid is the client GID which is the same as client_gid in REQUEST_SENT event. And service_client_id saves ID as current client_id.

I'd like to hear your opinions.

@MichaelOrlov
Copy link
Contributor

@Barry-Xu-2018 I am sorry I will be on vacation for the next two weeks and will not be able to take a look at your findings.

@fujitatomoya @emersonknapp Could you please take a look on the issue found by @Barry-Xu-2018 in the previous post?

@Barry-Xu-2018
Copy link
Contributor Author

Friendly ping @emersonknapp @fujitatomoya, do you have any idea about this issue #1414 (comment) ?

@fujitatomoya
Copy link
Contributor

@Barry-Xu-2018 sorry to be late to get back to you.

These 2 client gids are different. So we have no way to identify these 2 events for one request.

before how to fix, how come these gids are different?

i think that gid (event_type: 0) is the client gid via rmw_get_gid_for_client, and gid (event_type: 1) points the same writer gid but that object is managed on server side, that is why this is different?

https://github.com/ros-infrastructure/rep/pull/360/files#diff-7739b7ffccd3276874ea9014c4cd07872efa4e2d60d7c31896e56929ad424455R144-R149

seems like, this is against the design. if the gid is not unique through the transaction, we cannot tell service request and response match.

@Barry-Xu-2018
Copy link
Contributor Author

Barry-Xu-2018 commented Aug 25, 2023

@fujitatomoya

https://github.com/ros-infrastructure/rep/pull/360/files#diff-7739b7ffccd3276874ea9014c4cd07872efa4e2d60d7c31896e56929ad424455R144-R149

seems like, this is against the design. if the gid is not unique through the transaction, we cannot tell service request and response match.

Completely agree. The current implementation cannot confirm whether the sent request record and the received request record are pointing to the same request. This doesn't follow the design.

After checking code for rmw_fastrtps, I find service take request

https://github.com/ros2/rmw_fastrtps/blob/b45e73500bc0a2056fdc31197a6f94434b327d55/rmw_fastrtps_shared_cpp/src/rmw_request.cpp#L111-L117

        request.sample_identity_ = info_seq[0].sample_identity;    // Get writer_guid
        // Use response subscriber guid (on related_sample_identity) when present.
        const eprosima::fastrtps::rtps::GUID_t & reader_guid =
          info_seq[0].related_sample_identity.writer_guid();
        if (reader_guid != eprosima::fastrtps::rtps::GUID_t::unknown()) {
          request.sample_identity_.writer_guid() = reader_guid;    // writer_guid will be replace by reader guid. 
        }

Where do I report this bug ? Each rmw ?

@fujitatomoya
Copy link
Contributor

I think that is what needs to be unique are different between service server/client message and service introspection event message.

i think that this is just unexpected behavior because request_header is overwritten and reused from rmw_take_request to rmw_send_response.

Copy link

@iuhilnehc-ynos iuhilnehc-ynos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some comments based on my first review.

docs/message_definition_encoding.md Outdated Show resolved Hide resolved
ros2bag/ros2bag/verb/burst.py Outdated Show resolved Hide resolved
ros2bag/ros2bag/verb/play.py Outdated Show resolved Hide resolved
ros2bag/ros2bag/verb/play.py Show resolved Hide resolved
ros2bag/ros2bag/verb/play.py Outdated Show resolved Hide resolved
rosbag2_cpp/include/rosbag2_cpp/service_utils.hpp Outdated Show resolved Hide resolved
std::unordered_map<std::string, std::shared_ptr<PlayerPublisher>> publishers_;
using SharedPlayerPublisher = std::shared_ptr<PlayerPublisher>;
using SharedPlayerClient = std::shared_ptr<PlayerClient>;
std::unordered_map<

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible that the name of topic and service are the same?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. It is possible. I will consider how to handle this issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After checking implementation, for service, service event topic name is used as key.
So if topic name is the same as service event topic name, there is a problem. Not sure whether do we need to deal with this case?

rosbag2_transport/src/rosbag2_transport/player.cpp Outdated Show resolved Hide resolved
rosbag2_cpp/src/rosbag2_cpp/service_utils.cpp Outdated Show resolved Hide resolved
@Barry-Xu-2018 Barry-Xu-2018 marked this pull request as ready for review September 12, 2023 07:47
@Barry-Xu-2018
Copy link
Contributor Author

Codes are ready to be reviewed. Please help to review them.

This PR depended on below PR:

One issue

@MichaelOrlov
Copy link
Contributor

@Barry-Xu-2018 Could you please rebase your branch on top of the latest rolling branch? It has some conflicts with the latest changes.

@Barry-Xu-2018 Barry-Xu-2018 force-pushed the review/topic-support-service branch from 67678ff to 93768b5 Compare September 20, 2023 09:03
@Barry-Xu-2018
Copy link
Contributor Author

Could you please rebase your branch on top of the latest rolling branch? It has some conflicts with the latest changes.

Done. I fixed the conflicts. @MichaelOrlov

@MichaelOrlov
Copy link
Contributor

@Barry-Xu-2018 Sorry, it seems will need to rebase and fix conflicts one more time.
We changed Player class recently to be a pimpl design.

Signed-off-by: Barry Xu <barry.xu@sony.com>
Signed-off-by: Barry Xu <barry.xu@sony.com>
Signed-off-by: Barry Xu <barry.xu@sony.com>
Signed-off-by: Barry Xu <barry.xu@sony.com>
Signed-off-by: Barry Xu <barry.xu@sony.com>
Signed-off-by: Barry Xu <barry.xu@sony.com>
Signed-off-by: Barry Xu <barry.xu@sony.com>
Signed-off-by: Barry Xu <barry.xu@sony.com>
@Barry-Xu-2018 Barry-Xu-2018 force-pushed the review/topic-support-service branch from 93768b5 to 2cea90d Compare September 22, 2023 05:17
@Barry-Xu-2018
Copy link
Contributor Author

@MichaelOrlov

Sorry, it seems will need to rebase and fix conflicts one more time.
We changed Player class recently to be a pimpl design.

It's ok.
I have done rebase.

@ros-discourse
Copy link

This pull request has been mentioned on ROS Discourse. There might be relevant details there:

https://discourse.ros.org/t/ros-2-tsc-meeting-minutes-2023-09-21/33733/1

@MichaelOrlov
Copy link
Contributor

@Barry-Xu-2018 I see that the current implementation depends on rclcpp::GenericClient implemented on your https://github.com/Barry-Xu-2018/rclcpp/tree/review/topic-generic-client-support branch. But there is no active open PR associated with those branch.
Is it still WIP or have you forgotten to create a PR?

Copy link
Contributor

@MichaelOrlov MichaelOrlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Barry-Xu-2018 This PR has grown and become very large with a lot of changes on the multiple layers.
First of all, I would like to say thank you for your thorough and tremendous work!
While it is thorough and addresses all aspects of this big feature – it has become difficult to manage/review it and move forward with it.
I would suggest to split it on two parts. One part could be bigger with everything among playback of the service requests. At least move changes in the rosbag2_transport::Player class to the separate follow-up PR. It turns out that playback of the service request is not trivial to implement.

I have made a first pass of the review and here is my major notes and proposals:

It seems we don’t have tests for the rosbag2_cpp::Info::read_service_info(...) and Python wrapper for information about services. It would be nice to add at least one sanity check.

I have a proposal about new parameters for topic/services exclusions:

  1. Keep the old parameter exclude but rename it to the exclude_regex. it will be applicable to both topics and services.
  2. Make exclude_topics and exclude_services as std::vector<std::string> i.e. space separated list of topics and services names for CLI arguments.

IMO it will be more user-friendly. We have had recently an incoming issue from one of the users when it was expected that the exclude parameter is a list of topics. Like opposite to the topics parameter.
It looks intuitively reasonable to expect it to be defined this way.

I would like to request to rename PlayerClient to the PlayerServiceClient and consider moving it to its own separate cpp and hpp files.
The PlayerClient is pretty large and independent from its parent PlayerImpl class. We already have a lot of code inside the Player class and with this inner class inside it will become more cluttered and more difficult to navigate and support.

We are doing deserialization for the same service message twice. Once in the PlayerClient::is_include_request_message(..) and the second time in the PlayerClient::async_send_request(). This is an expensive procedure.
I would like to request structural changes there.
Could you please rework is_include_request_message(..) to the get_service_request_message(serialized_message) and return smart pointer to the deserialized message? If there are no valid request message it will return a pointer to null. Then rework async_send_request(deserialized_message) to accept those smart pointer to the already deserialized message.

I also have a concern about PlayerClient::async_send_request. We can’t really make it sending request asynchronously at least with the current design. Because we need to wait for future returning from client_->async_send_request(..). Since we can't delete ros_message until the future complete. Otherwise, it will be a dangling pointer to the request_addr and memory could be reused and we have a chance to use corrupted memory during the client_->async_send_request(..). Also according to the notes in the client_->async_send_request(..) API need to call client_->remove_pending_request(future) if we wait for future and timeout happened.
As a first iteration, I would suggest to making sending client request synchronous with timeout. Then need to figure out how to design it to be asynchronous.
I have idea to create some static worker thread and wait in it for static condition variable by timeout and check for future. If another request needs to be sent we can signal to condition variable to interrupt the wait earlier than timeout, remove_pending_request(future) and then try to send the new service request.

Comment on lines +216 to +217
if args.exclude_services:
play_options.services_regex_to_exclude = args.exclude_services + '/_service_event'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Barry-Xu-2018 As far as understand the args.exclude_services is supposed to be a list of services regex separated by spaces.
Shouldn't we add space before + '/_service_event' as well?

Comment on lines +188 to +189
# One options out of --all, --all-topics, --all-services, --services, topics or --regex
# must be used
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# One options out of --all, --all-topics, --all-services, --services, topics or --regex
# must be used
# At least one options out of --all, --all-topics, --all-services, --services, topics or --regex
# must be used

Comment on lines +197 to +199
if not self._check_necessary_argument(args):
return print_error('Must specify only one option out of --all, --all-topics, '
'--all-services, --services, topics and --regex')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if not self._check_necessary_argument(args):
return print_error('Must specify only one option out of --all, --all-topics, '
'--all-services, --services, topics and --regex')
if not self._check_necessary_argument(args):
return print_error('Neet to specify one option out of --all, --all-topics, '
'--all-services, --services, topics and --regex')

Comment on lines +207 to +209
Format format = definition_identifier.format() == Format::SRV ?
Format::MSG : definition_identifier.format();
DefinitionIdentifier dep(dep_name, format);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Barry-Xu-2018 Could you please clarify why do we need to remap Format::SRV to the Format::MSG here?
It would be useful to have a comment in the source code.

Comment on lines +22 to +33
#include "rcl/service_introspection.h"
#include "rmw/rmw.h"

#include "rcpputils/filesystem_helper.hpp"
#include "rosbag2_cpp/service_utils.hpp"
#include "rosbag2_storage/logging.hpp"
#include "rosbag2_storage/metadata_io.hpp"
#include "rosbag2_storage/storage_interfaces/read_only_interface.hpp"
#include "rosbag2_storage/storage_factory.hpp"

#include "service_msgs/msg/service_event_info.hpp"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#include "rcl/service_introspection.h"
#include "rmw/rmw.h"
#include "rcpputils/filesystem_helper.hpp"
#include "rosbag2_cpp/service_utils.hpp"
#include "rosbag2_storage/logging.hpp"
#include "rosbag2_storage/metadata_io.hpp"
#include "rosbag2_storage/storage_interfaces/read_only_interface.hpp"
#include "rosbag2_storage/storage_factory.hpp"
#include "service_msgs/msg/service_event_info.hpp"
include "rmw/rmw.h"
#include "rosidl_typesupport_cpp/message_type_support.hpp"
#include "rcpputils/filesystem_helper.hpp"
#include "service_msgs/msg/service_event_info.hpp"
#include "rosbag2_cpp/service_utils.hpp"
#include "rosbag2_storage/metadata_io.hpp"
#include "rosbag2_storage/storage_factory.hpp"

Better organize includes. Adds missing and remove unused.


void PlayerImpl::PlayerClient::async_send_request(const rclcpp::SerializedMessage & message)
{
void * ros_message = new uint8_t[message_members_->size_of_];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be safer to use smart pointer rather than raw pointer to avoid memory leaks and better manage the liveliness of the data.

message_members_->init_function(
ros_message, rosidl_runtime_cpp::MessageInitialization::ZERO);

int ret = rmw_deserialize(&message.get_rcl_serialized_message(), ts_, ros_message);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are doing deserialization for the same message twice. Once in the is_include_request_message(..) and second time here. This is an expensive procedure.
I would like to request structural changes.
Could you please rework is_include_request_message(..) to the get_request_message and return smart pointer to the deserialized message.
Then rework async_send_request() to accept those smart pointer to the already deserialized message.

Comment on lines +1549 to +1550
void PlayerImpl::PlayerClient::async_send_request(const rclcpp::SerializedMessage & message)
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a concern about PlayerClient::async_send_request. We can’t really make it sending request asynchronously at least with the current design. Because we need to wait for future returning from client_->async_send_request(..). Since we can't delete ros_message until the future complete. Otherwise, it will be a dangling pointer to the request_addr and memory could be reused and we have a chance to use corrupted memory during the client_->async_send_request(..). Also according to the notes in the client_→async_send_request(..) API need to call client_→remove_pending_request(future) if we waited for future and timeout happened.

get_message_type_support_handle<service_msgs::msg::ServiceEventInfo>();
if (type_support_info == nullptr) {
throw std::runtime_error(
"It failed to get message type support handle of service event info !");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"It failed to get message type support handle of service event info !");
"Failed to get message type support handle of service event info !");

Maybe also add output to the log?

type_support_info,
reinterpret_cast<void *>(&msg));
if (ret != RMW_RET_OK) {
throw std::runtime_error("It failed to deserialize message !");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
throw std::runtime_error("It failed to deserialize message !");
throw std::runtime_error("Failed to deserialize message !");

@Barry-Xu-2018
Copy link
Contributor Author

@MichaelOrlov

I'm sorry for the late reply. I just finished a long holiday.

I see that the current implementation depends on rclcpp::GenericClient implemented on your https://github.com/Barry-Xu-2018/rclcpp/tree/review/topic-generic-client-support branch. But there is no active open PR associated with those branch.
Is it still WIP or have you forgotten to create a PR?

No. This PR is ready. But this PR depended on ros2/rclcpp#2209 which is under review. So I want to submit PR after 2209 is merged.

@Barry-Xu-2018
Copy link
Contributor Author

@MichaelOrlov

Thank you for your review.
I will start to handle your comment.

@Barry-Xu-2018
Copy link
Contributor Author

@MichaelOrlov

I created 2 PRs (#1480 and #1481) instead of this one.
And I will move your comment to these 2 PRs.
Please continue to review on these 2 PRs.

@ros-discourse
Copy link

This pull request has been mentioned on ROS Discourse. There might be relevant details there:

https://discourse.ros.org/t/ros-2-tsc-meeting-minutes-2023-10-12/34178/1

@ros-discourse
Copy link

This pull request has been mentioned on ROS Discourse. There might be relevant details there:

https://discourse.ros.org/t/ros-2-tsc-meeting-minutes-2023-11-16/34757/1

@fujitatomoya
Copy link
Contributor

@Barry-Xu-2018 do we want to close this PR, instead we need post the review comments either on #1480 or #1481 along with the implementation?

@Barry-Xu-2018
Copy link
Contributor Author

@fujitatomoya

do we want to close this PR, instead we need post the review comments either on #1480 or #1481 along with the implementation?

I think yes. All review comments had been moved to #1480 or #1481.
I will close this ticket.

@Barry-Xu-2018
Copy link
Contributor Author

@MichaelOrlov

I will close this ticket.
In ROS2 TSC meeting, please use #1480 and #1481 instead of this ticket.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants