Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible solution to #577 by reworking onRetryTimer() logic #660

Closed

Conversation

rethink-imcmahon
Copy link

This is an alternative to #654 by attempting to lock out the dropping_ flag. I'll explain it in more detail if it proves to be useful.

@ros-pull-request-builder
Copy link
Member

Can one of the admins verify this patch?

@rethink-imcmahon
Copy link
Author

@dirk-thomas could I get a pull request builder run? Thanks!

@dirk-thomas
Copy link
Member

@ros-pull-request-builder
Copy link
Member

Test failed.
Refer to this link for build results: http://jenkins.ros.org/job/_pull_request-indigo-ros_comm/371/

@rethink-imcmahon
Copy link
Author

Ok, it appears only one error is present, and it's rospy related, so this patch might be in the clear. Any idea why Jenkins pulled the #637 patch into the build in addition to this one?

http://jenkins.ros.org/job/_pull_request-indigo-ros_comm/ARCH_PARAM=amd64,UBUNTU_PARAM=trusty,label=devel/371/

@dirk-thomas
Copy link
Member

The Jenkins changelog just shows the other PR since it has been merged since the job was run last time.

I will trigger another run to see if the test is just flaky.

@dirk-thomas
Copy link
Member

@ros-pull-request-builder
Copy link
Member

Test failed.
Refer to this link for build results: http://jenkins.ros.org/job/_pull_request-indigo-ros_comm/372/

@rethink-imcmahon
Copy link
Author

Well that's is pretty strange. The test builds & runs fine on my workstation against this PR:

$ catkin_make run_tests_message_filters_rostest_test_test_subscriber.xml
Base path: /data/users/imcmahon/dev/ros_core_mutex_ws
Source space: /data/users/imcmahon/dev/ros_core_mutex_ws/src
Build space: /data/users/imcmahon/dev/ros_core_mutex_ws/build
Devel space: /data/users/imcmahon/dev/ros_core_mutex_ws/devel
Install space: /data/users/imcmahon/dev/ros_core_mutex_ws/install
####
#### Running command: "make cmake_check_build_system" in "/data/users/imcmahon/dev/ros_core_mutex_ws/build"
####
####
#### Running command: "make run_tests_message_filters_rostest_test_test_subscriber.xml -j8 -l8" in "/data/users/imcmahon/dev/ros_core_mutex_ws/build"
####
-- run_tests.py: execute commands
  /data/users/imcmahon/dev/ros_core_mutex_ws/src/ros_comm/tools/rostest/scripts/rostest --pkgdir=/data/users/imcmahon/dev/ros_core_mutex_ws/src/ros_comm/utilities/message_filters --package=message_filters --results-filename test_test_subscriber.xml --results-base-dir /data/users/imcmahon/dev/ros_core_mutex_ws/build/test_results /data/users/imcmahon/dev/ros_core_mutex_ws/src/ros_comm/utilities/message_filters/test/test_subscriber.xml 
... logging to /data/users/imcmahon/.ros/log/rostest-Threepio-25793.log
[ROSUNIT] Outputting test results to /data/users/imcmahon/dev/ros_core_mutex_ws/build/test_results/message_filters/rostest-test_test_subscriber.xml
testtest_subscriber ... ok

[ROSTEST]-----------------------------------------------------------------------

[message_filters.rosunit-test_subscriber/simple][passed]
[message_filters.rosunit-test_subscriber/subUnsubSub][passed]
[message_filters.rosunit-test_subscriber/subInChain][passed]
[message_filters.rosunit-test_subscriber/singleNonConstCallback][passed]
[message_filters.rosunit-test_subscriber/multipleNonConstCallbacksFilterSubscriber][passed]
[message_filters.rosunit-test_subscriber/multipleCallbacksSomeFilterSomeDirect][passed]

SUMMARY
 * RESULT: SUCCESS
 * TESTS: 6
 * ERRORS: 0
 * FAILURES: 0

rostest log file is in /data/users/imcmahon/.ros/log/rostest-Threepio-25793.log
-- run_tests.py: verify result "/data/users/imcmahon/dev/ros_core_mutex_ws/build/test_results/message_filters/rostest-test_test_subscriber.xml"
Built target run_tests_message_filters_rostest_test_test_subscriber.xml

@dirk-thomas
Copy link
Member

@ros-pull-request-builder
Copy link
Member

Test failed.
Refer to this link for build results: http://jenkins.ros.org/job/_pull_request-indigo-ros_comm/374/

@dirk-thomas
Copy link
Member

I looked further into the Jenkins error message and it seems that it is simply not able to extract the correct standard output since multiple tests run concurrently and therefore the console output is mixed.

The top of the stacktrace is accurate though:

test [subscribe_retry_tcp] did not generate test results

Based on that I looked into the full console output and searched for subscribe_retry_tcp:

-- run_tests.py: execute commands
  /home/rosbuild/hudson/workspace/_pull_request-indigo-ros_comm/ARCH_PARAM/amd64/UBUNTU_PARAM/trusty/label/devel/src_repository/monitored_vcs/tools/rostest/scripts/rostest --pkgdir=/home/rosbuild/hudson/workspace/_pull_request-indigo-ros_comm/ARCH_PARAM/amd64/UBUNTU_PARAM/trusty/label/devel/src_repository/monitored_vcs/test/test_roscpp --package=test_roscpp --results-filename test_launch_nonconst_subscriptions.xml --results-base-dir /home/rosbuild/hudson/workspace/_pull_request-indigo-ros_comm/ARCH_PARAM/amd64/UBUNTU_PARAM/trusty/label/devel/test_results/repos /home/rosbuild/hudson/workspace/_pull_request-indigo-ros_comm/ARCH_PARAM/amd64/UBUNTU_PARAM/trusty/label/devel/src_repository/monitored_vcs/test/test_roscpp/test/launch/nonconst_subscriptions.xml 
-- run_tests.py: verify result "/home/rosbuild/hudson/workspace/_pull_request-indigo-ros_comm/ARCH_PARAM/amd64/UBUNTU_PARAM/trusty/label/devel/test_results/repos/test_roscpp/rostest-test_launch_nonconst_subscriptions.xml"
[100%] Built target _run_tests_test_roscpp_rostest_test_launch_nonconst_subscriptions.xml
Scanning dependencies of target _run_tests_test_roscpp_rostest_test_launch_subscribe_retry_tcp.xml
test_roscpp-subscribe_retry_tcp: /usr/include/boost/thread/pthread/recursive_mutex.hpp:101: boost::recursive_mutex::~recursive_mutex(): Assertion `!pthread_mutex_destroy(&m)' failed.
... logging to /home/rosbuild/.ros/log/rostest-host05.storm.ros.org-7396.log
[ROSUNIT] Outputting test results to /home/rosbuild/hudson/workspace/_pull_request-indigo-ros_comm/ARCH_PARAM/amd64/UBUNTU_PARAM/trusty/label/devel/test_results/repos/test_roscpp/rostest-test_launch_subscribe_retry_tcp.xml
testsubscribe_retry_tcp ... FAILURE!
FAILURE: test [subscribe_retry_tcp] did not generate test results
  File "/usr/lib/python2.7/unittest/case.py", line 331, in run
    testMethod()
  File "/home/rosbuild/hudson/workspace/_pull_request-indigo-ros_comm/ARCH_PARAM/amd64/UBUNTU_PARAM/trusty/label/devel/src_repository/monitored_vcs/tools/rostest/src/rostest/runner.py", line 160, in fn
    self.assert_(os.path.isfile(test_file), "test [%s] did not generate test results"%test_name)
  File "/usr/lib/python2.7/unittest/case.py", line 424, in assertTrue
    raise self.failureException(msg)
--------------------------------------------------------------------------------

[ROSTEST]-----------------------------------------------------------------------

[testsubscribe_retry_tcp][failed]

SUMMARY
� * RESULT: FAIL�
 * TESTS: 0
 * ERRORS: 0
� * FAILURES: 1�

ERROR: The following tests failed to run:
 * testsubscribe_retry_tcp

rostest log file is in /home/rosbuild/.ros/log/rostest-host05.storm.ros.org-7396.log

@rethink-imcmahon
Copy link
Author

Quick update, now that I am looking at the right test in my local workspace, things are a bit more obvious:

$ catkin_make run_tests_test_roscpp_rostest_test_launch_subscribe_retry_tcp.xml
...
[ROSUNIT] Outputting test results to /data/users/imcmahon/ros_ws/build/test_results/test_roscpp/rostest-test_launch_subscribe_retry_tcp.xml
test_roscpp-subscribe_retry_tcp: /usr/include/boost/thread/pthread/recursive_mutex.hpp:101: boost::recursive_mutex::~recursive_mutex(): Assertion `!pthread_mutex_destroy(&m)' failed.
testsubscribe_retry_tcp ... FAILURE!

That recursive mutex error doesn't seem to show up on Jenkins. I figured if there were any errors in this patch, it would be a deadlock condition, so it's just going to a matter of uncovering the deadlock.

@dirk-thomas
Copy link
Member

@rethink-imcmahon
Copy link
Author

That it does! Additionally, there are 9 instances of the string 'recursive_mutex.hpp' in that full console output, all of which point to this mutex failure, though not all of them cause test errors. That doesn't bode well for this patch.

@rethink-imcmahon
Copy link
Author

@dirk-thomas Could I snag a pull request builder run? @jeffadamsc and I worked through the logic surrounding onRetryTimer() today to remove the need for accessing _parent inside a callback. We moved that logic back into onConnectionDropped(), preventing a boost::bind on UDP connections, which were going to be dropped anyway. I'll put together a full writeup of our logic if it passes our nightly run. Thanks!

@rethink-imcmahon rethink-imcmahon changed the title Possible solution to #577 by mutex locking dropping_ flag Possible solution to #577 by reworking onRetryTimer() logic Sep 21, 2015
@ros-pull-request-builder
Copy link
Member

Test passed.
Refer to this link for build results: http://jenkins.ros.org/job/_pull_request-indigo-ros_comm/392/

@rethink-imcmahon
Copy link
Author

Superseded by #670

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants