-
Notifications
You must be signed in to change notification settings - Fork 91
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
random exception during node startup with use_sim_time:=true when Iceoryx is used #469
Comments
That was no accident; the original bug report that that commit fixes mentions the same problem: ros2/rclcpp#2088 We could consider backporting ros2/rclcpp#2092 to Humble, since it doesn't break API or ABI. Can you confirm that it fixes the issue for you? |
Yes, it does fix the issue for us. We use the term "accident" because the commit message is about avoiding unintended modification of shared message while the main reason of this error that the memory kept by this shared_ptr is freed in the background. We are affraid that lot of other code can be affected with with issue, because we are suprising to see Use-After-Free error despite We have not seen any ROS documentation saying that one shold not keep shared_ptr to message from subcription callback. I have analyzed this error thoroughly and the reason this error happen can be seen in the following code snippets: // rclcpp/rclcpp/src/rclcpp/executor.cpp:627
void Executor::execute_subscription(rclcpp::SubscriptionBase::SharedPtr subscription) {
[...]
if (subscription->can_loan_messages()) {
void * loaned_msg = nullptr;
[...]
rcl_ret_t ret = rcl_take_loaned_message(
subscription->get_subscription_handle().get(),
&loaned_msg,
&message_info.get_rmw_message_info(),
nullptr);
[...]
subscription->handle_loaned_message(loaned_msg, message_info);
[...]
rcl_ret_t ret = rcl_return_loaned_message_from_subscription(
subscription->get_subscription_handle().get(),
loaned_msg);
loaned_msg = nullptr;
// rclcpp/rclcpp/include/rclcpp/subscription.hpp:361
void handle_loaned_message(void * loaned_message, const rclcpp::MessageInfo & message_info) override
{
[...]
auto typed_message = static_cast<ROSMessageType *>(loaned_message);
// message is loaned, so we have to make sure that the deleter does not deallocate the message
auto sptr = std::shared_ptr<ROSMessageType>(typed_message, [](ROSMessageType * msg) {(void) msg;});
[...]
any_callback_.dispatch(sptr, message_info);
// rclcpp/rclcpp/src/rclcpp/time_source.cpp:164
// Cache the last clock message received
void ClocksState::cache_last_msg(std::shared_ptr<const rosgraph_msgs::msg::Clock> msg)
{
last_msg_set_ = msg;
} In the above code the excecutor creates dummy shared_ptr with empty destructor and loads the loaned message, than calls the subscription callbacks ( I have confirmed it also with Valgrind:
In my opinion this is against the rules of using smart pointers and is very misleading. According to my knowledge, the intention of using smart pointers is that one should not manualy control the life time of memory. I have not found any ROS documentation saying that when using a subcription callback with shared_ptr you MAY NOT keep the message pointer beyond callback scope. Oposite, based on the My take from this issue is that subcription callbacks with shared_ptr should be forbiden compleatly and replaced with I understand that impact of this claim is tremendous and it is not feasible, therfore in my opinion as resolution of this issue ROS documentation and tutorials should be extented with a big statement that one MUST NOT extend message lifetime - the message should be explicitly copied (instead of just increasing shared_ptr use count). My other proposition is to check the use_count() of the message after callback and if the use count is increased at least log warning that access violation may occure. The best woud be to terminate such an application because as shown in this issue, the application may happily run reading forbidden memory without any error causing undefined behaviour. This issue does not affect only loaned messages but all other messages types too (serialized message, other message [rclcpp/rclcpp/src/rclcpp/executor.cpp:585]). In case of other message types the message pool is used, but the algorithm is the same: get message from pool, call subscription callbacks, return the message to pool. Therfore if the subcriber extends the life of message, the executor will keep modyfing the message. This will cause bugs which are extreamly hard to find, but will not crush the application because the memory still belongs to application. |
The whole point of the That said, I see exactly what you mean about the loaned APIs. They do not work like that; once the memory is returned to the underlying middleware, they should be considered invalid. So maybe we should forbid the shared_ptr version just for loaned messages. That said, I wasn't involved in the development of the loaned message API originally, so I'm not sure. @wjwwood do you have some idea on how this was supposed to work, and maybe what we should do here? |
Based on my analysis of This is the code from class Executor{
[...]
void execute_subscription(rclcpp::SubscriptionBase::SharedPtr subscription)
{
rclcpp::MessageInfo message_info;
message_info.get_rmw_message_info().from_intra_process = false;
if (subscription->is_serialized()) {
std::shared_ptr<SerializedMessage> serialized_msg = subscription->create_serialized_message();
subscription->take_serialized(*serialized_msg.get(), message_info);};
subscription->handle_serialized_message(serialized_msg, message_info);
subscription->return_serialized_message(serialized_msg);
} else if (subscription->can_loan_messages()) {
void * loaned_msg = nullptr;
auto sub_handle = subscription->get_subscription_handle().get()
rcl_take_loaned_message(sub_handle, &loaned_msg, [...]);
subscription->handle_loaned_message(loaned_msg, message_info);});
rcl_return_loaned_message_from_subscription(sub_handle, loaned_msg);
} else {
std::shared_ptr<void> message = subscription->create_message();
return subscription->take_type_erased(message.get(), message_info);
subscription->handle_message(message, message_info);
subscription->return_message(message);
}
}
[...]
}
class Subscription{
[...]
std::shared_ptr<void> create_message() override
{
return message_memory_strategy_->borrow_message();
}
std::shared_ptr<rclcpp::SerializedMessage> create_serialized_message() override
{
return message_memory_strategy_->borrow_serialized_message();
}
[...]
} The function names clearly indicate that messages are borrowed, filled with data, passed to callbacks, and messages are returned after callbacks are complete. class MessagePoolMemoryStrategy{
[...]
void return_message(std::shared_ptr<MessageT> & msg)
{
for (size_t i = 0; i < Size; ++i) {
if (pool_[i].msg_ptr_ == msg) {
pool_[i].used = false;
return;
}
}
throw std::runtime_error("Unrecognized message ptr in return_message.");
}
[...]
} I have confirmed the described behavior with the sample publisher and subscriber. #include <algorithm>
#include <rclcpp/rclcpp.hpp>
#include <rclcpp/strategies/message_pool_memory_strategy.hpp>
#include <sstream>
#include <std_msgs/msg/int64.hpp>
#include <vector>
class Subscriber : public rclcpp::Node {
public:
Subscriber() : Node("Subscriber") {
sub_ = this->create_subscription<std_msgs::msg::Int64>(
"my_topic", 10,
std::bind(&Subscriber::callback, this, std::placeholders::_1),
rclcpp::SubscriptionOptionsWithAllocator<std::allocator<void>>(),
std::make_shared<
rclcpp::strategies::message_pool_memory_strategy::
MessagePoolMemoryStrategy<std_msgs::msg::Int64, 10>>());
}
private:
void callback(std_msgs::msg::Int64::ConstSharedPtr msg) {
RCLCPP_INFO(this->get_logger(), "Received: '%ld'", msg->data);
std::stringstream ss;
std::for_each(msg_history_.begin(), msg_history_.end(),
[&ss](const auto &msg) { ss << msg->data << "; "; });
RCLCPP_INFO(this->get_logger(), "History: [%s]", ss.str().c_str());
msg_history_.push_back(msg);
}
std::vector<std_msgs::msg::Int64::ConstSharedPtr> msg_history_;
rclcpp::Subscription<std_msgs::msg::Int64>::SharedPtr sub_;
};
class Publisher : public rclcpp::Node {
public:
Publisher() : Node("Publisher"), count_(0) {
pub_ = this->create_publisher<std_msgs::msg::Int64>("my_topic", 10);
timer_ =
this->create_wall_timer(std::chrono::milliseconds(500),
std::bind(&Publisher::timer_callback, this));
}
private:
void timer_callback() {
auto msg = std_msgs::msg::Int64();
msg.data = count_++;
RCLCPP_INFO(this->get_logger(), "Publishing: '%ld'", msg.data);
pub_->publish(msg);
}
rclcpp::Publisher<std_msgs::msg::Int64>::SharedPtr pub_;
rclcpp::TimerBase::SharedPtr timer_;
int64_t count_;
};
int main(int argc, char **argv) {
rclcpp::init(argc, argv);
std::shared_ptr<rclcpp::Node> node;
if (argc > 1 && std::string(argv[1]) == "sub") {
node = std::make_shared<Subscriber>();
} else {
node = std::make_shared<Publisher>();
}
rclcpp::spin(node);
rclcpp::shutdown();
return 0;
} The output is
You can see that messages in history are being modified despite being kept as For me, it is arguable if it is a correct behavior or not. I do not like the idea that the handling of messages changes with subscription configuration. In my opinion, it violates shared pointer principles that even if I hold the pointer, it can be released in the background. I can agree that in case of changing memory strategy to a memory pool, one should expect that if captures more messages that are available in the pool, he should expect to run into troubles. However, it is not well documented and requires good knowledge of the underlying EDIT: I would say that this issue is less urgent with memory pools, as changing memory strategy requires a code modification inside Node and the developer should verify the impact of such a change on related callback. |
The ROS framework design should not allow invalid usage. So the library shouldn't disable particular functionality depending on the configuration, because developer who implements code doesn't know the settings where the software will be used, and in fact he shouldn't know.
Although, this may end-up with "memory leaks", since the dependent code may hold the shared_ptr forever. |
I don't think this is a valid way forward. The
This I totally agree with. What I think we should do here is change it so that the custom deleter for the shared_ptr that is created in With those fixes, I think that the surprising result from using a Are you willing to submit a pull request making those changes? |
I will try to prepare a pull request with releasing memory in shared_ptr destructor but I have an additional concern that this may result in additional problem with multithreding.
You can see in the above snipet that borrow_messages assues that next message in pool will be avaliable. If the user can retain shared_ptr, this function should search the message pool for avaliable message, as we do not know in which order messages are released. |
They are not thread-safe. However, we don't do locking at the
Yes, very good point. We'll have to fix that as well. |
ros2/rclcpp#2336 should fix this particular part of the problem. We are also looking at disabling loaned messages by default in ros2/rclcpp#2335 and ros2/rcl#1110 . That should at least make things safe to use. Later on we'll likely do more work to re-enable loaning with shared_ptrs by doing a custom deleter. |
We've done some work that fixes this:
The above works around the issue. There is one remaining issue here, which is to re-enable loaned messages along with shared_ptrs. I've opened up ros2/rclcpp#2401 to track that. With all of that, then, I'm going to close this one out. Thanks for the report, it led to a number of good fixes here. |
Bug report
Required Info:
Steps to reproduce issue
Expected behavior
Each node should start without errors.
Actual behavior
Random nodes fail startup with the following exception:
Additional information
This happens only in ros:humble. It doesn't happen in ros:iron and ros:rolling. I'm not sure why it happens only after enabling Iceoryx as it doesn't seems to be related directly to Iceoryx.
We tried to find the culprit and it seems to be this line https://github.com/ros2/rclcpp/blob/724b4588ecbab3d2bb1ce4fbf800defa0e5a842e/rclcpp/src/rclcpp/time_source.cpp#L166, which was accidentally (as far as commit's comment says) fixed in this commit in rolling: ros2/rclcpp@7d660ac
Basically, it seems that with Iceoryx enabled the "message loaning" mechanism is used for receiving /clock messages, and in the subscription the shared pointer to this message is stored as a field of an object and message is destroyed at the end of the callback and than stored shared pointer becomes invalid.
I'm not sure if this is an error in the usage of the shared pointer, or is this an error in message loaning mechanism.
Feature request
Feature description
Implementation considerations
The text was updated successfully, but these errors were encountered: