-
Notifications
You must be signed in to change notification settings - Fork 914
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Regression in 1.12.13: rosout node spins rather than blocking and waiting, using 100% CPU #1343
Comments
The issue is that, on linux systems and every other platform with monotonic clock set, |
That should not really be possible to happen, I cannot reproduce this issue, could you please provide some more information? Tried with latest versions from shadow-fixed (ros-kinetic-roscpp 1.12.13 and so on) and the simple listener from ros-kinetic-roscpp-tutorials and it works fine. |
I saw this same issue building roscpp from source just recently. Same symptom - near 100% cpu usage in every node. Reverting to 1.12.12 fixed the problem immediately. I can pull info off my system to help diagnose but I'm not sure what might be useful. It is a Ubuntu 16.04 system which is up to date with apt-get but with not much custom built - let me know what info would be useful. |
I did some further testing, and apparently this issue does not occur if you build the entire ros_comm stack from scratch; It only occurs if you either already have the base 1.12.12 system packaging installed and you only build ros_comm and roscpp_core, or if you rebuild only the ros_comm part for a source installation. So, some sort of compiler/linker issue. I have to set up some sort of testing apparatus to track the cause down with more specificity. |
Hm.. strange. I still can not reproduce this...
I built the parts to be tested and used the rest from /opt/ros/kinetic
It still works as expected, regardless if I run the listener from the binaries or the rebuilt tutorials... |
Ok, this is what I did to reproduce consistently:
|
Could it be that this is a problem with |
as I've said, I've done this using both the build process described in your post, just using It does not occur when I have built the whole ros_comm stack from source, using |
Hm.... I used the install space when testing this... will try again with the devel space. |
Just for reference, what catkin bugs should I be aware of? |
Should have phrased it more precisely: known limitation. |
Ok, I can reproduce it with the Dockerfile https://gist.github.com/flixr/77e36cfac037700080efd937148b0f7f To test this
and observe the high cpu load as you said... I have no clue why this is happening though. |
It's especially weird because we have over 100 devices running with #1250 without problems. |
+1 |
core tells us the following link problem, we are not sure the root cause of this yet. when problem observed, libboost_thread is used as following, #0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:225 when problem is NOT observed, #0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:225 |
so far, cannot find root cause. |
I was also affected by this issue after I compiled the kinetic version of roscpp from source and tried to dig deeper based on the previous observations posted here. My preliminary conclusion is that the approach taken in #1014 and #1250, to backport the
I think the only valid solution is to change the namespace of the backported condition variable implementation, e.g. to I will do further tests and propose a patch in the coming days. |
@meyerj excellent, thank you for this |
@meyerj did you finally fix this issue ? |
Yes, but my patch broke ROS timers and I did not have time yet to look into it again. But I could open a PR with the current patch, if someone else is interested. |
…tion_variable (fix ros#1343) ros#1014 and ros#1250 introduced a backported version of boost::condition_variable, where support for steady (monotonic) clocks has been added in version 1.61. But the namespace of the backported version was not changed and the symbol names might clash with the original version. Because the underlying clock used for the condition_variable is set in the constructor and must be consistent with the the expectations within member variables. The compiler might choose to inline one or the other or both, and is more likely to do so for optimized Release builds. But if it does not, the symbol ends up in the symbol table of roscpp and depending on which other libraries will be linked into the process it is unpredictable which of the two versions will be actually called at the end. In case the constructor defined in `/usr/include/boost/thread/pthread/condition_variable.hpp` was called and did not configure the internal pthread condition variable for monotonic clock, each call to the backported do_wait_until() method with a monotonic timestamp will return immediately and hence causes `CallbackQueue::callOne(timeout)` or `CallbackQueue::callAvailable(timeout)` to return immediately. This patch changes the namespace of the backported condition_variable implementation to boost_161. This removes the ambiguity with the original definition if both are used in the same process.
Non-working patch pushed to kinetic-devel...meyerj:fix-1343. |
Proposed patch submitted as #1557. |
…tion_variable (fix ros#1343) ros#1014 and ros#1250 introduced a backported version of boost::condition_variable, where support for steady (monotonic) clocks has been added in version 1.61. But the namespace of the backported version was not changed and the symbol names might clash with the original version. Because the underlying clock used for the condition_variable is set in the constructor and must be consistent with the the expectations within member variables. The compiler might choose to inline one or the other or both, and is more likely to do so for optimized Release builds. But if it does not, the symbol ends up in the symbol table of roscpp and depending on which other libraries will be linked into the process it is unpredictable which of the two versions will be actually called at the end. In case the constructor defined in `/usr/include/boost/thread/pthread/condition_variable.hpp` was called and did not configure the internal pthread condition variable for monotonic clock, each call to the backported do_wait_until() method with a monotonic timestamp will return immediately and hence causes `CallbackQueue::callOne(timeout)` or `CallbackQueue::callAvailable(timeout)` to return immediately. This patch changes the namespace of the backported condition_variable implementation to boost_161. This removes the ambiguity with the original definition if both are used in the same process.
…tion_variable (fix ros/ros_comm#1343) ros/ros_comm#1014 and ros/ros_comm#1250 introduced a backported version of boost::condition_variable, where support for steady (monotonic) clocks has been added in version 1.61. But the namespace of the backported version was not changed and the symbol names might clash with the original version. Because the underlying clock used for the condition_variable is set in the constructor and must be consistent with the the expectations within member variables. The compiler might choose to inline one or the other or both, and is more likely to do so for optimized Release builds. But if it does not, the symbol ends up in the symbol table of roscpp and depending on which other libraries will be linked into the process it is unpredictable which of the two versions will be actually called at the end. In case the constructor defined in `/usr/include/boost/thread/pthread/condition_variable.hpp` was called and did not configure the internal pthread condition variable for monotonic clock, each call to the backported do_wait_until() method with a monotonic timestamp will return immediately and hence causes `CallbackQueue::callOne(timeout)` or `CallbackQueue::callAvailable(timeout)` to return immediately. This patch changes the namespace of the backported condition_variable implementation to boost_161. This removes the ambiguity with the original definition if both are used in the same process.
* fix-1343-melodic-devel: roscpp: replace ROSCPP_BOOST_CONDITION_VARIABLE and ROSCPP_BOOST_CONDITION_VARIABLE_HEADER macros by a typedef in internal_condition_variable.h test_rosbag: add target dependency to fix unit test failures in parallel builds roscpp: fixed Boost version check in CMakeLists.txt add missing ns add more explicit namespaces fix namespaces roscpp: remove specialized implementation of TimerManager<T,D,E>::threadFunc() in steady_timer.cpp roscpp: do not use boost_161_condition_variable.h on Windows (untested) roscpp: use boost::condition_variable::wait_for() instead of deprecated timed_wait() roscpp: fix potential busy-wait loop caused by backported Boost condition_variable (fix ros#1343)
…tion_variable (fix ros#1343) ros#1014 and ros#1250 introduced a backported version of boost::condition_variable, where support for steady (monotonic) clocks has been added in version 1.61. But the namespace of the backported version was not changed and the symbol names might clash with the original version. Because the underlying clock used for the condition_variable is set in the constructor and must be consistent with the the expectations within member variables. The compiler might choose to inline one or the other or both, and is more likely to do so for optimized Release builds. But if it does not, the symbol ends up in the symbol table of roscpp and depending on which other libraries will be linked into the process it is unpredictable which of the two versions will be actually called at the end. In case the constructor defined in `/usr/include/boost/thread/pthread/condition_variable.hpp` was called and did not configure the internal pthread condition variable for monotonic clock, each call to the backported do_wait_until() method with a monotonic timestamp will return immediately and hence causes `CallbackQueue::callOne(timeout)` or `CallbackQueue::callAvailable(timeout)` to return immediately. This patch changes the namespace of the backported condition_variable implementation to boost_161. This removes the ambiguity with the original definition if both are used in the same process.
…ait spinning (ros#1878) * roscpp: fix potential busy-wait loop caused by backported Boost condition_variable (fix ros#1343) ros#1014 and ros#1250 introduced a backported version of boost::condition_variable, where support for steady (monotonic) clocks has been added in version 1.61. But the namespace of the backported version was not changed and the symbol names might clash with the original version. Because the underlying clock used for the condition_variable is set in the constructor and must be consistent with the the expectations within member variables. The compiler might choose to inline one or the other or both, and is more likely to do so for optimized Release builds. But if it does not, the symbol ends up in the symbol table of roscpp and depending on which other libraries will be linked into the process it is unpredictable which of the two versions will be actually called at the end. In case the constructor defined in `/usr/include/boost/thread/pthread/condition_variable.hpp` was called and did not configure the internal pthread condition variable for monotonic clock, each call to the backported do_wait_until() method with a monotonic timestamp will return immediately and hence causes `CallbackQueue::callOne(timeout)` or `CallbackQueue::callAvailable(timeout)` to return immediately. This patch changes the namespace of the backported condition_variable implementation to boost_161. This removes the ambiguity with the original definition if both are used in the same process. * roscpp: use boost::condition_variable::wait_for() instead of deprecated timed_wait() This fixes ROS timers in combination with 2c18b9f. The timer callbacks were not called because the TimerManager's thread function blocked indefinitely on boost::condition_variable::timed_wait(). Relative timed_wait() uses the system clock (boost::get_system_time()) unconditionally to calculate the absolute timestamp for do_wait_until(). If the condition variable has been initialized with BOOST_THREAD_HAS_CONDATTR_SET_CLOCK_MONOTONIC, it compares this timestamp with the monotonic clock and therefore blocks. This issue has been reported in https://svn.boost.org/trac10/ticket/12728 and will not be fixed. The timed_wait interface is apparently deprecated. * roscpp: do not use boost_161_condition_variable.h on Windows (untested) * roscpp: remove specialized implementation of TimerManager<T,D,E>::threadFunc() in steady_timer.cpp The updated generic definition in timer_manager.h should do the same with a minor update. In all cases we can call boost::condition_variable::wait_until() with an absolute time_point of the respective clock. The conversion from system_clock to steady_clock for Time and WallTime is done internally in boost::condition_variable::wait_until(lock_type& lock, const chrono::time_point<Clock, Duration>& t). * fix namespaces * add more explicit namespaces * add missing ns * roscpp: fixed Boost version check in CMakeLists.txt find_package(Boost) has to come before checking the Boost version. Otherwise BOOST_THREAD_HAS_CONDATTR_SET_CLOCK_MONOTONIC was not defined which triggered the assertion in timer_manager.h:240. Since Boost 1.67 BOOST_THREAD_HAS_CONDATTR_SET_CLOCK_MONOTONIC became the default if the platform supports it and the macro is not defined anymore. Instead, check for BOOST_THREAD_INTERNAL_CLOCK_IS_MONO. * roscpp: replace ROSCPP_BOOST_CONDITION_VARIABLE and ROSCPP_BOOST_CONDITION_VARIABLE_HEADER macros by a typedef in internal_condition_variable.h * Remove copy of boost::condition_variable implementation from Boost 1.61 in namespace boost_161 * Revert some changes in include directives and in CMakeLists.txt to minimize the diff to melodic-devel Addresses ros#1878 (review). * use wait_for(), remove TimerManagerTraits * Revert "use wait_for(), remove TimerManagerTraits" This reverts commit 2a67cf6. Co-authored-by: Antoine Hoarau <703240+ahoarau@users.noreply.github.com> Co-authored-by: Dirk Thomas <dirk-thomas@users.noreply.github.com>
…tonic clock [kinetic-devel] (#2011) * Drop custom implementation of boost::condition_variable to fix busy-wait spinning (#1878) * roscpp: fix potential busy-wait loop caused by backported Boost condition_variable (fix #1343) #1014 and #1250 introduced a backported version of boost::condition_variable, where support for steady (monotonic) clocks has been added in version 1.61. But the namespace of the backported version was not changed and the symbol names might clash with the original version. Because the underlying clock used for the condition_variable is set in the constructor and must be consistent with the the expectations within member variables. The compiler might choose to inline one or the other or both, and is more likely to do so for optimized Release builds. But if it does not, the symbol ends up in the symbol table of roscpp and depending on which other libraries will be linked into the process it is unpredictable which of the two versions will be actually called at the end. In case the constructor defined in `/usr/include/boost/thread/pthread/condition_variable.hpp` was called and did not configure the internal pthread condition variable for monotonic clock, each call to the backported do_wait_until() method with a monotonic timestamp will return immediately and hence causes `CallbackQueue::callOne(timeout)` or `CallbackQueue::callAvailable(timeout)` to return immediately. This patch changes the namespace of the backported condition_variable implementation to boost_161. This removes the ambiguity with the original definition if both are used in the same process. * roscpp: use boost::condition_variable::wait_for() instead of deprecated timed_wait() This fixes ROS timers in combination with 2c18b9f. The timer callbacks were not called because the TimerManager's thread function blocked indefinitely on boost::condition_variable::timed_wait(). Relative timed_wait() uses the system clock (boost::get_system_time()) unconditionally to calculate the absolute timestamp for do_wait_until(). If the condition variable has been initialized with BOOST_THREAD_HAS_CONDATTR_SET_CLOCK_MONOTONIC, it compares this timestamp with the monotonic clock and therefore blocks. This issue has been reported in https://svn.boost.org/trac10/ticket/12728 and will not be fixed. The timed_wait interface is apparently deprecated. * roscpp: do not use boost_161_condition_variable.h on Windows (untested) * roscpp: remove specialized implementation of TimerManager<T,D,E>::threadFunc() in steady_timer.cpp The updated generic definition in timer_manager.h should do the same with a minor update. In all cases we can call boost::condition_variable::wait_until() with an absolute time_point of the respective clock. The conversion from system_clock to steady_clock for Time and WallTime is done internally in boost::condition_variable::wait_until(lock_type& lock, const chrono::time_point<Clock, Duration>& t). * fix namespaces * add more explicit namespaces * add missing ns * roscpp: fixed Boost version check in CMakeLists.txt find_package(Boost) has to come before checking the Boost version. Otherwise BOOST_THREAD_HAS_CONDATTR_SET_CLOCK_MONOTONIC was not defined which triggered the assertion in timer_manager.h:240. Since Boost 1.67 BOOST_THREAD_HAS_CONDATTR_SET_CLOCK_MONOTONIC became the default if the platform supports it and the macro is not defined anymore. Instead, check for BOOST_THREAD_INTERNAL_CLOCK_IS_MONO. * roscpp: replace ROSCPP_BOOST_CONDITION_VARIABLE and ROSCPP_BOOST_CONDITION_VARIABLE_HEADER macros by a typedef in internal_condition_variable.h * Remove copy of boost::condition_variable implementation from Boost 1.61 in namespace boost_161 * Revert some changes in include directives and in CMakeLists.txt to minimize the diff to melodic-devel Addresses #1878 (review). * use wait_for(), remove TimerManagerTraits * Revert "use wait_for(), remove TimerManagerTraits" This reverts commit 2a67cf6. Co-authored-by: Antoine Hoarau <703240+ahoarau@users.noreply.github.com> Co-authored-by: Dirk Thomas <dirk-thomas@users.noreply.github.com> * Use an internal implementation of boost::condition_variable with monotonic clock (#1932) * Use an internal implementation of condition_variable with monotonic clock to avoid ODR violation * Fix build of ros/internal/condition_variable.h for Boost <1.65 * Fix "static_assert with no message is a C++17 extension" warning in ros/internal/condition_variable.h * roscpp: fix ros/internal/condition_variable.h for pre-C++11 compilers Co-authored-by: Antoine Hoarau <703240+ahoarau@users.noreply.github.com> Co-authored-by: Dirk Thomas <dirk-thomas@users.noreply.github.com>
Introduced in faab4cb. This is a bug in the roscpp client.
In
callAvailable()
in callback_queue.cpp, if no callback is available,wait_for()
(located in boost_161_pthread_condition_variable_fwd.h) is called on the condition variable, and is told to wait for a specified amount of time (usually 0.1 seconds). However, instead of waiting for the specified amount of time, it returns immediately, turning this into a spin wait rather than a blocking wait. This leads to the rosout node chewing up 100% of the CPU core.The text was updated successfully, but these errors were encountered: