-
Notifications
You must be signed in to change notification settings - Fork 910
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Drop custom implementation of boost::condition_variable to fix busy-wait spinning #1878
Drop custom implementation of boost::condition_variable to fix busy-wait spinning #1878
Conversation
…tion_variable (fix ros#1343) ros#1014 and ros#1250 introduced a backported version of boost::condition_variable, where support for steady (monotonic) clocks has been added in version 1.61. But the namespace of the backported version was not changed and the symbol names might clash with the original version. Because the underlying clock used for the condition_variable is set in the constructor and must be consistent with the the expectations within member variables. The compiler might choose to inline one or the other or both, and is more likely to do so for optimized Release builds. But if it does not, the symbol ends up in the symbol table of roscpp and depending on which other libraries will be linked into the process it is unpredictable which of the two versions will be actually called at the end. In case the constructor defined in `/usr/include/boost/thread/pthread/condition_variable.hpp` was called and did not configure the internal pthread condition variable for monotonic clock, each call to the backported do_wait_until() method with a monotonic timestamp will return immediately and hence causes `CallbackQueue::callOne(timeout)` or `CallbackQueue::callAvailable(timeout)` to return immediately. This patch changes the namespace of the backported condition_variable implementation to boost_161. This removes the ambiguity with the original definition if both are used in the same process.
…ed timed_wait() This fixes ROS timers in combination with 2c18b9f. The timer callbacks were not called because the TimerManager's thread function blocked indefinitely on boost::condition_variable::timed_wait(). Relative timed_wait() uses the system clock (boost::get_system_time()) unconditionally to calculate the absolute timestamp for do_wait_until(). If the condition variable has been initialized with BOOST_THREAD_HAS_CONDATTR_SET_CLOCK_MONOTONIC, it compares this timestamp with the monotonic clock and therefore blocks. This issue has been reported in https://svn.boost.org/trac10/ticket/12728 and will not be fixed. The timed_wait interface is apparently deprecated.
…eadFunc() in steady_timer.cpp The updated generic definition in timer_manager.h should do the same with a minor update. In all cases we can call boost::condition_variable::wait_until() with an absolute time_point of the respective clock. The conversion from system_clock to steady_clock for Time and WallTime is done internally in boost::condition_variable::wait_until(lock_type& lock, const chrono::time_point<Clock, Duration>& t).
find_package(Boost) has to come before checking the Boost version. Otherwise BOOST_THREAD_HAS_CONDATTR_SET_CLOCK_MONOTONIC was not defined which triggered the assertion in timer_manager.h:240. Since Boost 1.67 BOOST_THREAD_HAS_CONDATTR_SET_CLOCK_MONOTONIC became the default if the platform supports it and the macro is not defined anymore. Instead, check for BOOST_THREAD_INTERNAL_CLOCK_IS_MONO.
…ITION_VARIABLE_HEADER macros by a typedef in internal_condition_variable.h
ABI-compatibility report: compat_report-fix-1343-melodic-devel-2.html 100% compatibility according to abi-compliance-checker (on Ubuntu 18.04 amd64, with gcc 7.4.0). Only symbols exposed in public headers have been considered - as recommended on that page. |
49996b3
to
3faf9b1
Compare
…61 in namespace boost_161
3faf9b1
to
8e44572
Compare
…nimize the diff to melodic-devel Addresses ros#1878 (review).
👍 for removing the backported boost headers on melodic and up. |
@meyerj Still I have all my CPUs going crazy if I build with clang++-6 and c++14 enabled , on Ubuntu 18.04 with Boost 1.65.1. |
It is still possible that the symbols I have not tested this idea yet, but one option to get rid of the ODR violation - at least when building shared libraries - is to compile roscpp with hidden symbols (<LANG>_VISIBILITY_PRESET or VISIBILITY_INLINES_HIDDEN cmake properties). Then the Boost symbols and other internally used symbols will not be exposed in the dynamic symbol table and therefore cannot affect other libraries' behavior. This approach requires that all public symbols are exported explicitly using the |
I agree. If you check how boost::condition_variable::wait_for() is implemented for the case |
abi-compliance-checker is not happy with a All those symbols are defined in a roscpp header or implicitly by the compiler, so normally derived packages should not depend on them and compile their own versions instead. This check has been done by comparing debug versions of the The good thing is that also all |
This reverts commit 2a67cf6.
@ahoarau Can you please double check this patch again? Several others have reported in the past that it fixes the problem for them. If it doesn't for you can please share the complete and exact steps you did to try it. |
I have convinced myself that the current patch fixes the problem for Melodic (and does the right thing by getting rid of the copied Boost files).
Dockerfile - vanilla Kinetic:FROM ros:kinetic-ros-base-xenialRUN mkdir -p /catkin_ws/src && RUN cd /catkin_ws && RUN echo '#!/bin/bash\n ENTRYPOINT ["/devel_entrypoint.sh"]
Dockerfile - vanilla Melodic:FROM ros:melodic-ros-base-bionicRUN mkdir -p /catkin_ws/src && RUN cd /catkin_ws && RUN echo '#!/bin/bash\n ENTRYPOINT ["/devel_entrypoint.sh"]
Dockerfile - patched Melodic:FROM ros:melodic-ros-base-bionicRUN mkdir -p /catkin_ws/src && RUN cd /catkin_ws && RUN echo '#!/bin/bash\n ENTRYPOINT ["/devel_entrypoint.sh"] I will go ahead and merge this patch by the end of the day (if there are no objections). Thank you to everyone who has contributed to this - especially @meyerj for creating and iterating on the patch. |
I was one of those people, but I only compiled using GCC, so I just re-ran your two Melodic Dockerfiles (Didn't bother w/ Kinetic) using I got the same behaviour with
The Dockerfile is using the Ubuntu Bionic defaults: |
Moving forward and merging this. Please feel free to continue commenting on the closed ticket. |
Special thanks to @meyerj for working on multiple iterations of this important fix as well as everyone else who helped with this. |
…ait spinning (ros#1878) * roscpp: fix potential busy-wait loop caused by backported Boost condition_variable (fix ros#1343) ros#1014 and ros#1250 introduced a backported version of boost::condition_variable, where support for steady (monotonic) clocks has been added in version 1.61. But the namespace of the backported version was not changed and the symbol names might clash with the original version. Because the underlying clock used for the condition_variable is set in the constructor and must be consistent with the the expectations within member variables. The compiler might choose to inline one or the other or both, and is more likely to do so for optimized Release builds. But if it does not, the symbol ends up in the symbol table of roscpp and depending on which other libraries will be linked into the process it is unpredictable which of the two versions will be actually called at the end. In case the constructor defined in `/usr/include/boost/thread/pthread/condition_variable.hpp` was called and did not configure the internal pthread condition variable for monotonic clock, each call to the backported do_wait_until() method with a monotonic timestamp will return immediately and hence causes `CallbackQueue::callOne(timeout)` or `CallbackQueue::callAvailable(timeout)` to return immediately. This patch changes the namespace of the backported condition_variable implementation to boost_161. This removes the ambiguity with the original definition if both are used in the same process. * roscpp: use boost::condition_variable::wait_for() instead of deprecated timed_wait() This fixes ROS timers in combination with 2c18b9f. The timer callbacks were not called because the TimerManager's thread function blocked indefinitely on boost::condition_variable::timed_wait(). Relative timed_wait() uses the system clock (boost::get_system_time()) unconditionally to calculate the absolute timestamp for do_wait_until(). If the condition variable has been initialized with BOOST_THREAD_HAS_CONDATTR_SET_CLOCK_MONOTONIC, it compares this timestamp with the monotonic clock and therefore blocks. This issue has been reported in https://svn.boost.org/trac10/ticket/12728 and will not be fixed. The timed_wait interface is apparently deprecated. * roscpp: do not use boost_161_condition_variable.h on Windows (untested) * roscpp: remove specialized implementation of TimerManager<T,D,E>::threadFunc() in steady_timer.cpp The updated generic definition in timer_manager.h should do the same with a minor update. In all cases we can call boost::condition_variable::wait_until() with an absolute time_point of the respective clock. The conversion from system_clock to steady_clock for Time and WallTime is done internally in boost::condition_variable::wait_until(lock_type& lock, const chrono::time_point<Clock, Duration>& t). * fix namespaces * add more explicit namespaces * add missing ns * roscpp: fixed Boost version check in CMakeLists.txt find_package(Boost) has to come before checking the Boost version. Otherwise BOOST_THREAD_HAS_CONDATTR_SET_CLOCK_MONOTONIC was not defined which triggered the assertion in timer_manager.h:240. Since Boost 1.67 BOOST_THREAD_HAS_CONDATTR_SET_CLOCK_MONOTONIC became the default if the platform supports it and the macro is not defined anymore. Instead, check for BOOST_THREAD_INTERNAL_CLOCK_IS_MONO. * roscpp: replace ROSCPP_BOOST_CONDITION_VARIABLE and ROSCPP_BOOST_CONDITION_VARIABLE_HEADER macros by a typedef in internal_condition_variable.h * Remove copy of boost::condition_variable implementation from Boost 1.61 in namespace boost_161 * Revert some changes in include directives and in CMakeLists.txt to minimize the diff to melodic-devel Addresses ros#1878 (review). * use wait_for(), remove TimerManagerTraits * Revert "use wait_for(), remove TimerManagerTraits" This reverts commit 2a67cf6. Co-authored-by: Antoine Hoarau <703240+ahoarau@users.noreply.github.com> Co-authored-by: Dirk Thomas <dirk-thomas@users.noreply.github.com>
…tonic clock [kinetic-devel] (#2011) * Drop custom implementation of boost::condition_variable to fix busy-wait spinning (#1878) * roscpp: fix potential busy-wait loop caused by backported Boost condition_variable (fix #1343) #1014 and #1250 introduced a backported version of boost::condition_variable, where support for steady (monotonic) clocks has been added in version 1.61. But the namespace of the backported version was not changed and the symbol names might clash with the original version. Because the underlying clock used for the condition_variable is set in the constructor and must be consistent with the the expectations within member variables. The compiler might choose to inline one or the other or both, and is more likely to do so for optimized Release builds. But if it does not, the symbol ends up in the symbol table of roscpp and depending on which other libraries will be linked into the process it is unpredictable which of the two versions will be actually called at the end. In case the constructor defined in `/usr/include/boost/thread/pthread/condition_variable.hpp` was called and did not configure the internal pthread condition variable for monotonic clock, each call to the backported do_wait_until() method with a monotonic timestamp will return immediately and hence causes `CallbackQueue::callOne(timeout)` or `CallbackQueue::callAvailable(timeout)` to return immediately. This patch changes the namespace of the backported condition_variable implementation to boost_161. This removes the ambiguity with the original definition if both are used in the same process. * roscpp: use boost::condition_variable::wait_for() instead of deprecated timed_wait() This fixes ROS timers in combination with 2c18b9f. The timer callbacks were not called because the TimerManager's thread function blocked indefinitely on boost::condition_variable::timed_wait(). Relative timed_wait() uses the system clock (boost::get_system_time()) unconditionally to calculate the absolute timestamp for do_wait_until(). If the condition variable has been initialized with BOOST_THREAD_HAS_CONDATTR_SET_CLOCK_MONOTONIC, it compares this timestamp with the monotonic clock and therefore blocks. This issue has been reported in https://svn.boost.org/trac10/ticket/12728 and will not be fixed. The timed_wait interface is apparently deprecated. * roscpp: do not use boost_161_condition_variable.h on Windows (untested) * roscpp: remove specialized implementation of TimerManager<T,D,E>::threadFunc() in steady_timer.cpp The updated generic definition in timer_manager.h should do the same with a minor update. In all cases we can call boost::condition_variable::wait_until() with an absolute time_point of the respective clock. The conversion from system_clock to steady_clock for Time and WallTime is done internally in boost::condition_variable::wait_until(lock_type& lock, const chrono::time_point<Clock, Duration>& t). * fix namespaces * add more explicit namespaces * add missing ns * roscpp: fixed Boost version check in CMakeLists.txt find_package(Boost) has to come before checking the Boost version. Otherwise BOOST_THREAD_HAS_CONDATTR_SET_CLOCK_MONOTONIC was not defined which triggered the assertion in timer_manager.h:240. Since Boost 1.67 BOOST_THREAD_HAS_CONDATTR_SET_CLOCK_MONOTONIC became the default if the platform supports it and the macro is not defined anymore. Instead, check for BOOST_THREAD_INTERNAL_CLOCK_IS_MONO. * roscpp: replace ROSCPP_BOOST_CONDITION_VARIABLE and ROSCPP_BOOST_CONDITION_VARIABLE_HEADER macros by a typedef in internal_condition_variable.h * Remove copy of boost::condition_variable implementation from Boost 1.61 in namespace boost_161 * Revert some changes in include directives and in CMakeLists.txt to minimize the diff to melodic-devel Addresses #1878 (review). * use wait_for(), remove TimerManagerTraits * Revert "use wait_for(), remove TimerManagerTraits" This reverts commit 2a67cf6. Co-authored-by: Antoine Hoarau <703240+ahoarau@users.noreply.github.com> Co-authored-by: Dirk Thomas <dirk-thomas@users.noreply.github.com> * Use an internal implementation of boost::condition_variable with monotonic clock (#1932) * Use an internal implementation of condition_variable with monotonic clock to avoid ODR violation * Fix build of ros/internal/condition_variable.h for Boost <1.65 * Fix "static_assert with no message is a C++17 extension" warning in ros/internal/condition_variable.h * roscpp: fix ros/internal/condition_variable.h for pre-C++11 compilers Co-authored-by: Antoine Hoarau <703240+ahoarau@users.noreply.github.com> Co-authored-by: Dirk Thomas <dirk-thomas@users.noreply.github.com>
An extension of #1651, which addresses #1343 (Regression in 1.12.13: rosout node spins rather than blocking and waiting, using 100% CPU):
Summary:
The patch fixes an ODR violation causing undefined behavior, which - depending on the compiler flags for
roscpp
and compiler and linker internals - can make nodes either spin at 100% CPU or block indefinitely waiting on aboost::condition_variable
. This happens if the symbol of theboost::condition_variable
constructor is linked from a compilation unit whereBOOST_THREAD_HAS_CONDATTR_SET_CLOCK_MONOTONIC
has been defined, and symbols of member functions are linked from another compilation unit whereBOOST_THREAD_HAS_CONDATTR_SET_CLOCK_MONOTONIC
has not been defined, or vice-versa.Actually this macro is meant as a build-time setting in Boost (in boost/thread/detail/config.hpp) and defining it in user packages is always dangerous. It is important to never expose
boost::condition_variable
instances (or other Boost Thread types influenced by that flag?) in the public API, and to make sure that the constructor and member functions are only called internally byroscpp
compilation units, where theBOOST_THREAD_HAS_CONDATTR_SET_CLOCK_MONOTONIC
is defined. Strictly speaking, it is still an ODR violation because the macro effectively changes the implementation of the same symbols seen by different compilation units, and the patch only works because compilers tend to inline the relevant functions.This problem will be only solved by Boost 1.67 or above, where
BOOST_THREAD_HAS_CONDATTR_SET_CLOCK_MONOTONIC
(renamed toBOOST_THREAD_INTERNAL_CLOCK_IS_MONO
) became the default on platforms that support it (boostorg/thread@1e84b97).Extension to #1651:
As requested in #1651 (comment) I now completely removed the custom implementation of
boost::condition_variable
copied from Boost 1.61: