Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix/107 #831

Merged
merged 7 commits into from
Feb 8, 2017
Merged

Fix/107 #831

merged 7 commits into from
Feb 8, 2017

Conversation

mgrrx
Copy link
Contributor

@mgrrx mgrrx commented Jul 13, 2016

Fixes #107

I replaced the poll() behavior in rospy by epoll() which allows to check for EPOLLRDHUP events. Extract from the man page epoll_ctl:

Stream socket peer closed connection, or shut down writing half of connection. (This flag is especially useful for writing simple code to detect peer shutdown when using Edge Triggered monitoring.)

Using that event we can easily check for closed connections that were hung up by the remote node. I added that check to the regmanager to gracefully close those connections in its update loop. Especially latched publishers suffered from the old behavior as the messages are normally sent only once.

The fix works also on kinetic

@@ -189,7 +193,7 @@ class Poller(object):
"""
def __init__(self):
try:
self.poller = select.poll()
self.poller = select.epoll()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this affect cross platform compatibility? The documentation only mentions Linux, what about OS X and Windows?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Poller class in topics.py provides a unified interface for polling on all three platforms. I cleaned up the code a bit (48f4f0a) and added a fallback to select.poll for Linux < 2.5.44. I did not touch the code for OS X and Windows. So regarding the compatibility I'd say that this PR does not affect the compatibility.

@jspricke
Copy link
Member

@dirk-thomas anything that keeps this from merging (to kinetic)?

@dirk-thomas
Copy link
Member

It is mostly due to not having the time to thoroughly test this patch. Hopefully I can get back to this in the near future.

@alain-m
Copy link

alain-m commented Nov 17, 2016

We can reproduce the escalation of open file descriptors - lsof -n | wc -l - and sockets left in CLOSE_WAIT state - sudo netstat -np | grep CLOSE_WAIT | wc -l - on our indigo robots at Savioke during regular navigation tasks. The sockets in question are all associated with python nodes, as expected from reading #107.
We tested this PR on indigo and it effectively fixes the problem for us, thanks @mgrrx . We will test it on kinetic.

@alain-m
Copy link

alain-m commented Jan 18, 2017

Any plans for releasing this fix soon @dirk-thomas? We've been using it on both indigo and kinetic robots since November and haven't found any regression. Let us know if we can help with specific tests.

@mgrrx mgrrx added the hitlist label Jan 18, 2017
@mgrrx
Copy link
Contributor Author

mgrrx commented Jan 18, 2017

@alain-m thanks for testing this.
@mikepurvis I added the hitlist label ;) Can you integrate this PR into your kinetic-edge branch and test it at Clearpath? We've tested this on indigo here at Magazino for quiet some time now without regressions. I can create a PR for kinetic if this helps

@dirk-thomas
Copy link
Member

Since it has been tested by two parties for a while already I would be fine with merging a PR against kinetic-devel now. The kinetic-edge branch can be rebased against the latest master afterwards to also cover this until the next release.

…nd is related to ros#610 as connections remained in CLOSE_WAIT.

Added a check for CLOSE_WAIT connections in the RegManager loop
Added a fallback to select.poll if select.epoll is not available in
Linux. OS X will still use kqueue and Windows the noop operator.
…rded but the lock is only acquired when connection list needs to be modified. This avoids unnecessary locking.
@jspricke jspricke changed the base branch from indigo-devel to kinetic-devel January 18, 2017 17:23
@jspricke jspricke changed the base branch from kinetic-devel to indigo-devel January 18, 2017 17:25
@jspricke jspricke changed the base branch from indigo-devel to kinetic-devel January 18, 2017 17:30
@jspricke
Copy link
Member

@dirk-thomas I've moved this to kinetic-devel. No idea why Jenkins has problems with it. Can you have a look and merge it if acceptable?

@dirk-thomas
Copy link
Member

@ros-pull-request-builder retest this please

# ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
# POSSIBILITY OF SUCH DAMAGE.
#
# Revision $Id: gossipbot.py 1013 2008-05-21 01:08:56Z sfkwc $
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line should probably be removed.

@dirk-thomas
Copy link
Member

I can't say why it didn't trigger the kinetic job after rebasing and "asking Jenkins nicely". I triggered the job manually (http://build.ros.org/view/Kpr/job/Kpr__ros_comm__ubuntu_xenial_amd64/272/) but it won't show up as a build status here when its finished.

@dirk-thomas
Copy link
Member

dirk-thomas commented Jan 18, 2017

Your latest push to the branch triggered a new build anyway. Waiting for the result on that...


import psutil

from std_msgs.msg import String
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was just about to merge this. But on a last view through the diff I saw that std_msgs is currently neither a runtime nor test dependency yet. Can you please add it as a <test_depend>std_msgs</test_depend>.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... but now it's upset because it was already listed as a build dependency. Maybe just switch this package to format 2?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am sorry. I am only using format 2 nowadays. Can you please just remove the added test dependency again. No need to update to the new format in this PR.

@jspricke
Copy link
Member

jspricke commented Jan 22, 2017 via email

mikepurvis added a commit to clearpathrobotics/ros_comm that referenced this pull request Jan 31, 2017
mikepurvis added a commit that referenced this pull request Jan 31, 2017
mikepurvis added a commit that referenced this pull request Jan 31, 2017
mikepurvis added a commit that referenced this pull request Feb 7, 2017
mikepurvis added a commit that referenced this pull request Feb 7, 2017
mikepurvis added a commit that referenced this pull request Feb 7, 2017
@dirk-thomas dirk-thomas merged commit c57b173 into ros:kinetic-devel Feb 8, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants