-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Resolve issues identified while investigating #21 #22
Conversation
Signed-off-by: Andrea Sorbini <asorbini@rti.com>
Signed-off-by: Andrea Sorbini <asorbini@rti.com>
Signed-off-by: Andrea Sorbini <asorbini@rti.com>
Signed-off-by: Andrea Sorbini <asorbini@rti.com>
Signed-off-by: Andrea Sorbini <asorbini@rti.com>
Signed-off-by: Andrea Sorbini <asorbini@rti.com>
Signed-off-by: Andrea Sorbini <asorbini@rti.com>
Signed-off-by: Andrea Sorbini <asorbini@rti.com>
Signed-off-by: Andrea Sorbini <asorbini@rti.com>
Signed-off-by: Andrea Sorbini <asorbini@rti.com>
Signed-off-by: Andrea Sorbini <asorbini@rti.com>
Linter errors resolved in 0b8fd8a |
@clalancette @ivanpauno @wjwwood a quick review to merge these changes to CI looks good, and the few errors are linter ones that have already been resolved. Thanks! |
This is true, I hope at some point we improve the rmw waitset API.
If we're deleting "waited on" objects somewhere, that's a bug. |
rmw_connextdds_common/include/rmw_connextdds/rmw_waitset_std.hpp
Outdated
Show resolved
Hide resolved
Signed-off-by: Andrea Sorbini <asorbini@rti.com>
Signed-off-by: Andrea Sorbini <asorbini@rti.com>
Addressed review comments. Running CI again to validate:
As I mentioned in this comment, based on #21, something "fishy" seemed to be happening in Gazebo's |
@ivanpauno after reading the article that you linked, I think the code still suffers from "lost wakeup". Maybe I'm missing something, but I'm considering refactoring to use semaphores and avoid this problem with the condition variable API altogether. |
Signed-off-by: Andrea Sorbini <asorbini@rti.com>
Signed-off-by: Andrea Sorbini <asorbini@rti.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still have some concerns related with concurrency.
I also need to take a closer look to the code again...
rmw_connextdds_common/include/rmw_connextdds/rmw_waitset_std.hpp
Outdated
Show resolved
Hide resolved
rmw_connextdds_common/include/rmw_connextdds/rmw_waitset_std.hpp
Outdated
Show resolved
Hide resolved
rmw_connextdds_common/include/rmw_connextdds/rmw_waitset_std.hpp
Outdated
Show resolved
Hide resolved
std::mutex * waitset_mutex = this->waitset_mutex; | ||
auto scope_exit = rcpputils::make_scope_exit( | ||
[waitset_mutex]() | ||
{ | ||
if (nullptr != waitset_mutex) { | ||
waitset_mutex->unlock(); | ||
} | ||
}); | ||
if (nullptr != waitset_mutex) { | ||
waitset_mutex->lock(); | ||
} | ||
std::lock_guard<std::mutex> lock(this->mutex_internal); | ||
this->triggered_data = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to above, I think this should be:
std::mutex * waitset_mutex = this->waitset_mutex; | |
auto scope_exit = rcpputils::make_scope_exit( | |
[waitset_mutex]() | |
{ | |
if (nullptr != waitset_mutex) { | |
waitset_mutex->unlock(); | |
} | |
}); | |
if (nullptr != waitset_mutex) { | |
waitset_mutex->lock(); | |
} | |
std::lock_guard<std::mutex> lock(this->mutex_internal); | |
this->triggered_data = true; | |
std::lock_guard<std::mutex> lock(this->mutex_internal); | |
if (nullptr != this->waitset_mutex) { | |
std::lock_guard<std::mutex> lock(this->waitset_mutex); | |
this->triggered_data = true; | |
} else { | |
this->triggered_data = true; | |
} | |
if (this->waitset_condition) { | |
this->waitset_condition->notify_one(); | |
} |
rmw_connextdds_common/include/rmw_connextdds/rmw_waitset_std.hpp
Outdated
Show resolved
Hide resolved
Yes, this kind of code always gets a bit tricky 😬 . The trick with condition variables is to always follow a pattern like this when modifying a "condition" (i.e. something that can modify the return value of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks reasonable to me too.
Signed-off-by: Andrea Sorbini <asorbini@rti.com>
Removed DDS-based implementation altogether in ce3f86d, and created branch asorbini/dds-waitsets to preserve it for future reference. |
I opened rmw_connextdds#26 to introduce some QoS optimizations for the reliability protocol which resolve the test failures in |
I've been investigating the invalid memory access in
I think the problem is that when a Node object is removed from the executor and ultimately deleted, there is no guarantee that the executor thread has already got out of a concurrent When the executor's waitset is awaken, it goes through its list of condition to check their status and "tear down" the Relevant code in rclcpp: Not sure about a fix, because I'm not too familiar with For now, I don't think this is a bug in I wonder if this is the root cause of the concurrent |
Yes, your description sounds like a probelem in |
@ivanpauno thanks, looking forward to your findings! |
Wow, this is really reducing the ammount of failures when testing with Great work @asorbini !!! |
* Use C++ std WaitSets by default Signed-off-by: Andrea Sorbini <asorbini@rti.com> * Use Rolling in README's Quick Start Signed-off-by: Andrea Sorbini <asorbini@rti.com> * Improved implementation of client::is_service_available for Connext Pro Signed-off-by: Andrea Sorbini <asorbini@rti.com> * Only add request header to typecode with Basic req/rep profile Signed-off-by: Andrea Sorbini <asorbini@rti.com> * Remove commented/unused code Signed-off-by: Andrea Sorbini <asorbini@rti.com> * Avoid topic name validation in get_info functions Signed-off-by: Andrea Sorbini <asorbini@rti.com> * Reduce shutdown period to 10ms Signed-off-by: Andrea Sorbini <asorbini@rti.com> * Pass HistoryQosPolicy to graph cache * Reset error string after looking up type support Signed-off-by: Andrea Sorbini <asorbini@rti.com> * Remove DDS-based WaitSet implementation Signed-off-by: Andrea Sorbini <asorbini@rti.com>
* Use C++ std WaitSets by default Signed-off-by: Andrea Sorbini <asorbini@rti.com> * Use Rolling in README's Quick Start Signed-off-by: Andrea Sorbini <asorbini@rti.com> * Improved implementation of client::is_service_available for Connext Pro Signed-off-by: Andrea Sorbini <asorbini@rti.com> * Only add request header to typecode with Basic req/rep profile Signed-off-by: Andrea Sorbini <asorbini@rti.com> * Remove commented/unused code Signed-off-by: Andrea Sorbini <asorbini@rti.com> * Avoid topic name validation in get_info functions Signed-off-by: Andrea Sorbini <asorbini@rti.com> * Reduce shutdown period to 10ms Signed-off-by: Andrea Sorbini <asorbini@rti.com> * Pass HistoryQosPolicy to graph cache * Reset error string after looking up type support Signed-off-by: Andrea Sorbini <asorbini@rti.com> * Remove DDS-based WaitSet implementation Signed-off-by: Andrea Sorbini <asorbini@rti.com>
* Use C++ std WaitSets by default Signed-off-by: Andrea Sorbini <asorbini@rti.com> * Use Rolling in README's Quick Start Signed-off-by: Andrea Sorbini <asorbini@rti.com> * Improved implementation of client::is_service_available for Connext Pro Signed-off-by: Andrea Sorbini <asorbini@rti.com> * Only add request header to typecode with Basic req/rep profile Signed-off-by: Andrea Sorbini <asorbini@rti.com> * Remove commented/unused code Signed-off-by: Andrea Sorbini <asorbini@rti.com> * Avoid topic name validation in get_info functions Signed-off-by: Andrea Sorbini <asorbini@rti.com> * Reduce shutdown period to 10ms Signed-off-by: Andrea Sorbini <asorbini@rti.com> * Pass HistoryQosPolicy to graph cache * Reset error string after looking up type support Signed-off-by: Andrea Sorbini <asorbini@rti.com> * Remove DDS-based WaitSet implementation Signed-off-by: Andrea Sorbini <asorbini@rti.com>
* Resolve issues identified while investigating #21 (#22) * Use C++ std WaitSets by default Signed-off-by: Andrea Sorbini <asorbini@rti.com> * Use Rolling in README's Quick Start Signed-off-by: Andrea Sorbini <asorbini@rti.com> * Improved implementation of client::is_service_available for Connext Pro Signed-off-by: Andrea Sorbini <asorbini@rti.com> * Only add request header to typecode with Basic req/rep profile Signed-off-by: Andrea Sorbini <asorbini@rti.com> * Remove commented/unused code Signed-off-by: Andrea Sorbini <asorbini@rti.com> * Avoid topic name validation in get_info functions Signed-off-by: Andrea Sorbini <asorbini@rti.com> * Reduce shutdown period to 10ms Signed-off-by: Andrea Sorbini <asorbini@rti.com> * Pass HistoryQosPolicy to graph cache * Reset error string after looking up type support Signed-off-by: Andrea Sorbini <asorbini@rti.com> * Remove DDS-based WaitSet implementation Signed-off-by: Andrea Sorbini <asorbini@rti.com> * Remove unavailable time helpers Signed-off-by: Andrea Sorbini <asorbini@rti.com>
* Resolve issues identified while investigating #21 (#22) * Use C++ std WaitSets by default Signed-off-by: Andrea Sorbini <asorbini@rti.com> * Use Rolling in README's Quick Start Signed-off-by: Andrea Sorbini <asorbini@rti.com> * Improved implementation of client::is_service_available for Connext Pro Signed-off-by: Andrea Sorbini <asorbini@rti.com> * Only add request header to typecode with Basic req/rep profile Signed-off-by: Andrea Sorbini <asorbini@rti.com> * Remove commented/unused code Signed-off-by: Andrea Sorbini <asorbini@rti.com> * Avoid topic name validation in get_info functions Signed-off-by: Andrea Sorbini <asorbini@rti.com> * Reduce shutdown period to 10ms Signed-off-by: Andrea Sorbini <asorbini@rti.com> * Pass HistoryQosPolicy to graph cache * Reset error string after looking up type support Signed-off-by: Andrea Sorbini <asorbini@rti.com> * Remove DDS-based WaitSet implementation Signed-off-by: Andrea Sorbini <asorbini@rti.com> * Remove unavailable time helpers Signed-off-by: Andrea Sorbini <asorbini@rti.com> * Remove unsupported QoS event Signed-off-by: Andrea Sorbini <asorbini@rti.com>
* Resolve issues identified while investigating #21 (#22) * Use C++ std WaitSets by default Signed-off-by: Andrea Sorbini <asorbini@rti.com> * Use Rolling in README's Quick Start Signed-off-by: Andrea Sorbini <asorbini@rti.com> * Improved implementation of client::is_service_available for Connext Pro Signed-off-by: Andrea Sorbini <asorbini@rti.com> * Only add request header to typecode with Basic req/rep profile Signed-off-by: Andrea Sorbini <asorbini@rti.com> * Remove commented/unused code Signed-off-by: Andrea Sorbini <asorbini@rti.com> * Avoid topic name validation in get_info functions Signed-off-by: Andrea Sorbini <asorbini@rti.com> * Reduce shutdown period to 10ms Signed-off-by: Andrea Sorbini <asorbini@rti.com> * Pass HistoryQosPolicy to graph cache * Reset error string after looking up type support Signed-off-by: Andrea Sorbini <asorbini@rti.com> * Remove DDS-based WaitSet implementation Signed-off-by: Andrea Sorbini <asorbini@rti.com> * Remove unavailable time helpers Signed-off-by: Andrea Sorbini <asorbini@rti.com> * Remove unsupported QoS event Signed-off-by: Andrea Sorbini <asorbini@rti.com>
This PR includes several improvements identified while investigating issue #21:
std
library.The reason for this is still unclear, but it probably lies in the fundamental mismatch between the ROS 2 WaitSet API and the DDS one. The RMW tries to compensate for the lack of
attach()
anddetach()
in ROS 2 by trying to cache its arguments acrosswait()
calls. Evidently this is not enough, and it may be even partially responsible for unnecessarily added CPU cycles under certain circumstances.std
was updated to fix the standing issues with "status reporting", and it is now the default.std
implementation relies on DDS listeners and coordinates events usingstd::condition_variable
's andmutex
's. Because it uses listeners (which automatically consume status flags and reset "change" counters), it must cache and keep track of reported statuses, in order to then later report the correct "status changes" to the user.RMW_Connext_Client::is_service_available()
RMW_Connext_Client::is_service_available()
was updated to only returntrue
if both a "request reader" and a "reply writer" have been matched from the same remote DomainParticipant.RMW_DURATION_INFINITE
inWaitSet::wait()
wait_timeout
argument. I noticed in Add rmw_publisher_wait_for_all_acked support #20 that there is this constant so I added support for it.Allow deletion of "waited on" objects (alternative WaitSet implementation)This implementation will most likely be removed altogether, but I made some updates to try to resolve some "catastrophic failures" when running with applications that seem to be deleting objects while also trying to delete them (gzserver
from Gazebo). This code didn't fully resolve the issues and is probably still bugged, hence why I ultimately went with item1.
Note to reviewers: feel free to ignore changes inrmw_impl_waitset_dds.cpp
andrmw_waitset_dds.hpp
Initial CI validation: