Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Subscribing & Publishing from ROS nodes does not work #1391

Closed
janbernloehr opened this issue Nov 19, 2016 · 38 comments
Closed

Subscribing & Publishing from ROS nodes does not work #1391

janbernloehr opened this issue Nov 19, 2016 · 38 comments

Comments

@janbernloehr
Copy link

janbernloehr commented Nov 19, 2016

The current state of WSL makes working with ROS nodes on WSL impossible.

What works

What doesn't work

  • Already something simple as
    rostopic echo /rosout fails with the message

[WARN] [1479567965.827686]: Inbound TCP/IP connection failed: connection from sender terminated before handshake header received. 0 bytes were received. Please check sender for additional details.

It seems that the implementation of network sockets on WSL is still incomplete.

I tested this on builds 14965 and 14971.

More information

To help prioritize this issue, I created a post on User Voice

@janbernloehr
Copy link
Author

janbernloehr commented Nov 19, 2016

#69, #134, #610, and #720 seem to be related. Maybe @sunilmut can have a look into it?

@gavanderhoorn
Copy link

gavanderhoorn commented Nov 20, 2016

This report is a bit too vague: the error message you include is ROS-specific, so isn't going to be much help to WSL devs.

I would include at least some strace output that shows what is actually not working.

Already something simple as rostopic echo /rosout fails with the message

The rostopic echo .. binary exercises the entire networking stack of ROS, so that would be expected.

@janbernloehr
Copy link
Author

janbernloehr commented Nov 20, 2016

here are the straces of roscore and rostopic. unfortunately i am not experienced in interpreting them.
rostopic.txt
roscore.txt

i will get hold of logs where everything works tomorrow

@janbernloehr
Copy link
Author

janbernloehr commented Nov 20, 2016

Here are two similar logs from a working Ubuntu 16.04 LTS + ROS Kinetic installation

roscore.txt
rostopic.txt

When comparing them, the HTTP response received immediately before the error message is emitted does have a nonzero body. In fact the body length is identical on WSL and Ubuntu 16.04...

bildschirmfoto 2016-11-20 um 18 33 30

@janbernloehr
Copy link
Author

The issue persists in 14986.

@janbernloehr
Copy link
Author

The issue persists in 15002.

Starting a roscore works with the output

... logging to /home/jan/.ros/log/bbcd1c24-d78a-11e6-a37c-74d435eb8bb7/roslaunch-jan-pc-1192.log
Checking log directory for disk usage. This may take awhile.
Press Ctrl-C to interrupt
Done checking log file disk usage. Usage is <1GB.

started roslaunch server http://jan-pc:3024/
ros_comm version 1.12.6


SUMMARY
========

PARAMETERS
 * /rosdistro: kinetic
 * /rosversion: 1.12.6

NODES

auto-starting new master
process[master]: started with pid [1238]
ROS_MASTER_URI=http://jan-pc:11311/

setting /run_id to bbcd1c24-d78a-11e6-a37c-74d435eb8bb7
process[rosout-1]: started with pid [1251]
started core service [/rosout]

roswtf yields

Loaded plugin tf.tfwtf
No package or stack in context
================================================================================
Static checks summary:

No errors or warnings
================================================================================
Beginning tests of your ROS graph. These may take awhile...
analyzing graph...
... done analyzing graph
running graph rules...
... done running graph rules

Online checks summary:

Found 1 warning(s).
Warnings are things that may be just fine, but are sometimes at fault

WARNING The following node subscriptions are unconnected:
 * /rosout:
   * /rosout

Again, roswtf complains about the unconnected subscription of /rosout.

Finally, rostopic echo /rosout yields

[WARN] [1484090178.231790]: Inbound TCP/IP connection failed: connection from sender terminated before handshake header received. 0 bytes were received. Please check sender for additional details.
header:
  seq: 1
  stamp:
    secs: 1484090178
    nsecs: 232148885
  frame_id: ''
level: 4
name: /rostopic_1302_1484090177803
msg: Inbound TCP/IP connection failed: connection from sender terminated before handshake header received. 0 bytes were received. Please check sender for additional details.
file: tcpros_base.py
function: _tcp_server_callback
line: 351
topics: ['/rosout']
---

@janbernloehr
Copy link
Author

Same behavior on 15014.

jan@jan-pc:~$ rostopic echo /rosout
[WARN] [1485120994.774538]: Inbound TCP/IP connection failed: connection from sender terminated before handshake header received. 0 bytes were received. Please check sender for additional details.
header:
  seq: 1
  stamp:
    secs: 1485120994
    nsecs: 775190114
  frame_id: ''
level: 4
name: /rostopic_159_1485120994289
msg: Inbound TCP/IP connection failed: connection from sender terminated before handshake header received. 0 bytes were received. Please check sender for additional details.
file: tcpros_base.py
function: _tcp_server_callback
line: 351
topics: ['/rosout']
---

@Andersw88
Copy link

While testing with roscpp and rospy talker_listener tutorial examples at https://github.com/ros/ros_tutorials on build 15025 it seems like rospy nodes works, but only sending works for roscpp (talker). I can run roscpp_tutorial talker together with rospy_tutorial listener.

I still get
[WARN] [1486081593.661951]: Inbound TCP/IP connection failed: connection from sender terminated before handshake header received. 0 bytes were received. Please check sender for additional details.
from rospy_tutorial listener, once per node, though which seem to originate from ros_comm/tools/rosgraph/src/rosgraph/network.py line: 364.

I ran strace on roscpp listener on wsl and my working ubuntu 16.04.
roslistener-ubuntu.txt
roslistener-wsl.txt

I would happy to provide any additional information need.

@janbernloehr
Copy link
Author

The issue persists in 15048

$ rostopic echo /rosout
[WARN] [1489011779.094188]: Inbound TCP/IP connection failed: connection from sender terminated before handshake header received. 0 bytes were received. Please check sender for additional details.
header:
  seq: 1
  stamp:
    secs: 1489011779
    nsecs:  95611095
  frame_id: ''
level: 4
name: /rostopic_113_1489011778600
msg: Inbound TCP/IP connection failed: connection from sender terminated before handshake header received. 0 bytes were received. Please check sender for additional details.
file: tcpros_base.py
function: _tcp_server_callback
line: 351
topics: ['/rosout']
---

@noldona
Copy link

noldona commented Apr 13, 2017

Issue still exists in Version 1703 (OS Build 15063.138)

$ rostopic echo /rosout
[WARN] [1492050385.834945]: Inbound TCP/IP connection failed: connection from sender terminated before handshake header received. 0 bytes were received. Please check sender for additional details.
header:
seq: 1
stamp:
secs: 1492050385
nsecs: 843270063
frame_id: ''
level: 4
name: /rostopic_175_1492050385406
msg: Inbound TCP/IP connection failed: connection from sender terminated before handshake header received. 0 bytes were received. Please check sender for additional details.
file: tcpros_base.py
function: _tcp_server_callback
line: 351
topics: ['/rosout']


@sunilmut sunilmut added the bug label Apr 21, 2017
@bxwllzz
Copy link

bxwllzz commented May 8, 2017

Issue still exists in Version 1703 (OS Build 16188.1000)
Part of strace -ttt -f -e trace=network roscore outputs:

...
[pid  7675] 1494309936.117607 socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 12
[DEBUG] [1494309936.118895200]: Resolved publisher host [LZZ-PC] to [127.0.1.1] for socket [12]
[pid  7675] 1494309936.119084 connect(12, {sa_family=AF_INET, sin_port=htons(6134), sin_addr=inet_addr("127.0.1.1")}, 16) = 0
[DEBUG] [1494309936.119954500]: Connect to tcpros publisher [LZZ-PC:6134] failed with error [0, Resource temporarily unavailable]
[DEBUG] [1494309936.120224000]: PollSet: Tried to delete fd [12] which is not being tracked
[pid  7675] 1494309936.120402 shutdown(12, SHUT_RDWR) = 0
[DEBUG] [1494309936.122175800]: TCP socket [12] closed
[DEBUG] [1494309936.122466800]: Failed to connect to publisher of topic [/rosout] at [LZZ-PC:6134]
...

Part of strace -ttt -f -e trace=network rostopic pub /rosout std_msgs/String "hello there" outputs:

...
strace: Process 4514 attached
[pid  4509] 1494264439.408673 socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 5
[pid  4509] 1494264439.409056 bind(5, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("0.0.0.0")}, 16) = 0
[pid  4509] 1494264439.409926 getsockname(5, {sa_family=AF_INET, sin_port=htons(13277), sin_addr=inet_addr("0.0.0.0")}, [16]) = 0
[pid  4509] 1494264439.411255 listen(5, 5) = 0
strace: Process 4515 attached
[pid  4515] 1494264439.414580 accept(5,  <unfinished ...>
[pid  4509] 1494264439.416377 socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 6
[pid  4509] 1494264439.417212 connect(6, {sa_family=AF_INET, sin_port=htons(11311), sin_addr=inet_addr("127.0.0.1")}, 16) = 0
[pid  4509] 1494264439.419391 sendto(6, "POST /RPC2 HTTP/1.1\r\nHost: local"..., 588, 0, NULL, 0) = 588
[pid  4509] 1494264439.422269 recvfrom(6, "HTTP/1.0 200 OK\r\nServer: BaseHTT"..., 8192, 0, NULL, NULL) = 463
[pid  4509] 1494264439.426131 socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 6
[pid  4509] 1494264439.426445 connect(6, {sa_family=AF_INET, sin_port=htons(11311), sin_addr=inet_addr("127.0.0.1")}, 16) = 0
[pid  4509] 1494264439.427774 sendto(6, "POST /RPC2 HTTP/1.1\r\nHost: local"..., 593, 0, NULL, 0) = 593
[pid  4509] 1494264439.434634 recvfrom(6, "HTTP/1.0 200 OK\r\nServer: BaseHTT"..., 8192, 0, NULL, NULL) = 468
[pid  4509] 1494264439.438553 socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 7
[pid  4509] 1494264439.439105 connect(7, {sa_family=AF_INET, sin_port=htons(11311), sin_addr=inet_addr("127.0.0.1")}, 16) = 0
[pid  4509] 1494264439.444378 sendto(7, "POST /RPC2 HTTP/1.1\r\nHost: local"..., 551, 0, NULL, 0) = 551
[pid  4509] 1494264439.445325 recvfrom(7, "HTTP/1.0 200 OK\r\nServer: BaseHTT"..., 8192, 0, NULL, NULL) = 501
publishing and latching message. Press ctrl-C to terminate
[pid  4513] 1494264440.484290 accept(4, {sa_family=AF_INET, sin_port=htons(13282), sin_addr=inet_addr("127.0.0.1")}, [16]) = 7
strace: Process 4519 attached
[pid  4519] 1494264440.489526 setsockopt(7, SOL_TCP, TCP_NODELAY, [1], 4) = 0
[pid  4519] 1494264440.490211 recvfrom(7, "POST / HTTP/1.1\r\nUser-Agent: XML"..., 8192, 0, NULL, NULL) = 412
[pid  4519] 1494264440.494777 sendto(7, "HTTP/1.0 200 OK\r\nServer: BaseHTT"..., 512, 0, NULL, 0) = 512
[pid  4519] 1494264440.495407 shutdown(7, SHUT_WR) = 0
[pid  4519] 1494264440.498369 +++ exited with 0 +++
[pid  4515] 1494264440.614276 <... accept resumed> {sa_family=AF_INET, sin_port=htons(13283), sin_addr=inet_addr("127.0.0.1")}, [16]) = 7
[pid  4515] 1494264440.617155 recvfrom(7, "", 4096, 0, NULL, NULL) = 0
[WARN] [1494264440.618661]: Inbound TCP/IP connection failed(debug): connection from sender terminated before handshake header received. 0 bytes were received. Please check sender for additional details.
[pid  4515] 1494264440.628397 accept(5,
...

It looks like that the TCP connection from rosout(forked from roscore) to rostopic is closed before established, as shown below:

[rosout as TCP client][pid  4267] 1494264440.613661 connect(12, {sa_family=AF_INET, sin_port=htons(13277), sin_addr=inet_addr("127.0.1.1")}, 16) = 0
[rosout as TCP client][pid  4267] 1494264440.614272 shutdown(12, SHUT_RDWR) = 0
[rostopic as TCP server][pid  4515] 1494264440.614276 <... accept resumed> {sa_family=AF_INET, sin_port=htons(13283), sin_addr=inet_addr("127.0.0.1")}, [16]) = 7
[rostopic as TCP server][pid  4515] 1494264440.617155 recvfrom(7, "", 4096, 0, NULL, NULL) = 0
[rostopic as TCP server][WARN] [1494264440.618661]: Inbound TCP/IP connection failed(debug): connection from sender terminated before handshake header received. 0 bytes were received. Please check sender for additional details.

Server python code which raise exception(/opt/ros/kinetic/lib/python2.7/dist-packages/rosgraph/network.py):

def read_ros_handshake_header(sock, b, buff_size):
    """
    Read in tcpros header off the socket \a sock using buffer \a b.
    
    :param sock: socket must be in blocking mode, ``socket``
    :param b: buffer to use, ``StringIO`` for Python2, ``BytesIO`` for Python 3
    :param buff_size: incoming buffer size to use, ``int``
    :returns: key value pairs encoded in handshake, ``{str: str}``
    :raises: :exc:`ROSHandshakeException` If header format does not match expected
    """
    header_str = None
    while not header_str:
        d = sock.recv(buff_size)
        if not d:
            raise ROSHandshakeException("connection from sender terminated before handshake header received. %s bytes were received. Please check sender for additional details."%b.tell())
        b.write(d)
        btell = b.tell()
        if btell > 4:
            # most likely we will get the full header in the first recv, so
            # not worth tiny optimizations possible here
            bval = b.getvalue()
            (size,) = struct.unpack('<I', bval[0:4])
            if btell - 4 >= size:
                header_str = bval
                    
                # memmove the remnants of the buffer back to the start
                leftovers = bval[size+4:]
                b.truncate(len(leftovers))
                b.seek(0)
                b.write(leftovers)
                header_recvd = True
                    
    # process the header
    return decode_ros_handshake_header(bval)

and /opt/ros/kinetic/lib/python2.7/dist-packages/rospy/impl/tcpros_base.py

    def _tcp_server_callback(self, sock, client_addr):
        """
        TCPServer callback: detects incoming topic or service connection and passes connection accordingly
    
        @param sock: socket connection
        @type  sock: socket.socket
        @param client_addr: client address
        @type  client_addr: (str, int)
        @raise TransportInitError: If transport cannot be succesfully initialized
        """
        #TODOXXX:rewrite this logic so it is possible to create TCPROSTransport object first, set its protocol,
        #and then use that to do the writing
        try:
            buff_size = 4096 # size of read buffer
            if python3 == 0:
                #initialize read_ros_handshake_header with BytesIO for Python 3 (instead of bytesarray())    
                header = read_ros_handshake_header(sock, StringIO(), buff_size)
            else:
                header = read_ros_handshake_header(sock, BytesIO(), buff_size)
            
            if 'topic' in header:
                err_msg = self.topic_connection_handler(sock, client_addr, header)
            elif 'service' in header:
                err_msg = self.service_connection_handler(sock, client_addr, header)
            else:
                err_msg = 'no topic or service name detected'
            if err_msg:
                # shutdown race condition: nodes that come up and down
                # quickly can receive connections during teardown.
                
                # We use is_shutdown_requested() because we can get
                # into bad connection states during client shutdown
                # hooks.
                if not rospy.core.is_shutdown_requested():
                    write_ros_handshake_header(sock, {'error' : err_msg})
                    raise TransportInitError("Could not process inbound connection: "+err_msg+str(header))
                else:
                    write_ros_handshake_header(sock, {'error' : 'node shutting down'})
                    return
        except rospy.exceptions.TransportInitError as e:
            logwarn(str(e))
            if sock is not None:
                sock.close()
        except Exception as e:
            # collect stack trace separately in local log file
            if not rospy.core.is_shutdown_requested():
                logwarn("Inbound TCP/IP connection failed %s", e)
                rospyerr("Inbound TCP/IP connection failed\n%s", traceback.format_exc())
            if sock is not None:
                sock.close()

_tcp_server_callback is called from below code

    def run(self):
        """
        Main TCP receive loop. Should be run in a separate thread -- use start()
        to do this automatically.
        """
        self.is_shutdown = False
        if not self.server_sock:
            raise ROSInternalException("%s did not connect"%self.__class__.__name__)
        while not self.is_shutdown:
            try:
                (client_sock, client_addr) = self.server_sock.accept()
            except socket.timeout:
                continue
            except IOError as e:
                (errno, msg) = e.args
                if errno == 4: #interrupted system call
                    continue
                raise
            if self.is_shutdown:
                break
            try:
                #leave threading decisions up to inbound_handler
                self.inbound_handler(client_sock, client_addr)
            except socket.error as e:
                if not self.is_shutdown:
                    traceback.print_exc()
                    logwarn("Failed to handle inbound connection due to socket error: %s"%e)
        logdebug("TCPServer[%s] shutting down", self.port)

I'm new to ROS. I have not found the code of rosout which close the connection and cause this error.
Maybe someone else can help.

@bxwllzz
Copy link

bxwllzz commented May 9, 2017

I have determined the problem. The problem lies in the ROS.
Line 316 in ros_comm/roscpp/src/libros/transport/transport_tcp.cpp

  211 
  212 bool TransportTCP::connect(const std::string& host, int port)
  213 {
  214   if (!isHostAllowed(host))
  215     return false; // adios amigo
  216 
  217   sock_ = socket(s_use_ipv6_ ? AF_INET6 : AF_INET, SOCK_STREAM, 0);
  218   connected_host_ = host;
  219   connected_port_ = port;
  220 
  221   if (sock_ == ROS_INVALID_SOCKET)
  222   {
  223     ROS_ERROR("socket() failed with error [%s]",  last_socket_error_string());
  224     return false;
  225   }
  226 
  227   setNonBlocking();
  228 
  229   sockaddr_storage sas;
  230   socklen_t sas_len;

         ... resolve host ...

  312   int ret = ::connect(sock_, (sockaddr*) &sas, sas_len);
  313   // windows might need some time to sleep (input from service robotics hack) add this if testing proves it is necessary.
  314   ROS_ASSERT((flags_ & SYNCHRONOUS) || ret != 0);
  315   if (((flags_ & SYNCHRONOUS) && ret != 0) || // synchronous, connect() should return 0
  316       (!(flags_ & SYNCHRONOUS) && last_socket_error() != ROS_SOCKETS_ASYNCHRONOUS_CONNECT_RETURN)) // asynchronous, connect() should return -1 and WSAGetLastError()=WSAEWOULDBLOCK/errno=EINPROGRESS
  317   {
  318     ROSCPP_CONN_LOG_DEBUG("Connect to tcpros publisher [%s:%d] failed with error [%d, %s]", host.c_str(), port, ret, last_socket_error_string());
  319     close();
  320 
  321     return false;
  322   }
  323 

It assumes that the connect method of non-blocking scoket should return -1 and last_socket_error() should return ROS_SOCKETS_ASYNCHRONOUS_CONNECT_RETURN(=EINPROGRESS).
But a non-blocking connect can return 0 when TCP connection to 127.0.0.1 (localhost).
http://stackoverflow.com/questions/14027326/can-connect-return-0-with-non-blocing-socket

@gavanderhoorn
Copy link

Interesting. Have you tested this already by patching ros_comm and running it on WSL? I'll try to do the same today.

@bxwllzz
Copy link

bxwllzz commented May 9, 2017

@gavanderhoorn I'm trying to do this.

@gavanderhoorn
Copy link

gavanderhoorn commented May 9, 2017

Easiest is probably to use an overlay workspace with just ros_comm checked out and built from sources.

@bxwllzz
Copy link

bxwllzz commented May 9, 2017

Replace

  312   int ret = ::connect(sock_, (sockaddr*) &sas, sas_len);
  313   // windows might need some time to sleep (input from service robotics hack) add this if testing proves it is necessary.
  314   ROS_ASSERT((flags_ & SYNCHRONOUS) || ret != 0);
  315   if (((flags_ & SYNCHRONOUS) && ret != 0) || // synchronous, connect() should return 0
  316       (!(flags_ & SYNCHRONOUS) && last_socket_error() != ROS_SOCKETS_ASYNCHRONOUS_CONNECT_RETURN)) // asynchronous, connect() should return -1 and WSAGetLastError()=WSAEWOULDBLOCK/errno=EINPROGRESS
  317   {
  318     ROSCPP_CONN_LOG_DEBUG("Connect to tcpros publisher [%s:%d] failed with error [%d, %s]", host.c_str(), port, ret, last_socket_error_string());
  319     close();
  320 
  321     return false;
  322   }

in ros_comm/roscpp/src/libros/transport/transport_tcp.cpp with

  int ret = ::connect(sock_, (sockaddr*) &sas, sas_len);
  // windows might need some time to sleep (input from service robotics hack) add this if testing proves it is necessary.
  // ROS_ASSERT((flags_ & SYNCHRONOUS) || ret != 0);
  if (((flags_ & SYNCHRONOUS) && ret != 0) || // synchronous, connect() should return 0
      (!(flags_ & SYNCHRONOUS) && // asynchronous, connect() may return 0 or -1. When return 0, WSAGetLastError()=WSAEWOULDBLOCK/errno=EINPROGRESS
      (ret != 0 && last_socket_error() != ROS_SOCKETS_ASYNCHRONOUS_CONNECT_RETURN))) 
  {
    ROSCPP_CONN_LOG_DEBUG("Connect to tcpros publisher [%s:%d] failed with error [%d, %s]", host.c_str(), port, ret, last_socket_error_string());
    close();

    return false;
  }

TOPIC of ROS works. Test success in ROS/Tutorials/WritingPublisherSubscriber(python), ROS/Tutorials/ExaminingPublisherSubscriber and turtlesim

gavanderhoorn added a commit to gavanderhoorn/ros_comm that referenced this issue May 9, 2017
@gavanderhoorn
Copy link

gavanderhoorn commented May 9, 2017

Just tested this and it seems to work. I'm not entirely sure about commenting that ROS_ASSERT(..) though.

See gavanderhoorn/ros_comm@c83508d for the diff.

Nice work @bxwllzz 👍.

Would be nice if you could submit a PR against ros_comm to get this merged into mainline.

@bxwllzz
Copy link

bxwllzz commented May 9, 2017

I have submit a PR to ros_comm ros/ros_comm#1050

@gavanderhoorn
Copy link

Thanks. I've commented there as well.

@janbernloehr
Copy link
Author

I wrote down a step by step guide to get ros running on wsl here. In my view, the issue can be closed as fixed in the Creators Update.

@svanimisetti
Copy link

Linking to #1450 with my input there.

@sunilmut
Copy link
Member

sunilmut commented Aug 2, 2017

Just an update that non-blocking connect is on it's way and will also be available on Fall Creators Update.

@k-okada
Copy link

k-okada commented Oct 28, 2017

FYI : As of today, ROS kinetic (ros_comm 1.12.7) work out of box on WSL 16299.19, without changing source tree.

https://answers.ros.org/question/238646/installing-ros-on-ubuntu-bash-in-windows-10/

@sunilmut
Copy link
Member

@k-okada - Thanks for the validation.

@gavanderhoorn
Copy link

Just making sure: @k-okada: did you test this with ROS pubs/subs running the a single machine only (ie: the WSL machine), or were there more involved? I ask because a critical fix for Kinetic hasn't been backported from Lunar yet (and another is being reviewed for Lunar) which -- in the past at least -- was absolutely required to get things working.

@bxwllzz
Copy link

bxwllzz commented Oct 29, 2017

@gavanderhoorn What's the critical fix you mentioned?

I tested ROS Kinetic about a week ago. It works in single machine but didn't work in multi machine bacause of the reason I mentioned in ros/ros_comm#1202.

I have open a new PR (ros/ros_comm#1202) to ros_comm (lunar-devel) several days ago to fix a bug about TransportTCP. I'm sure my PR completely solve this problem for both Lunar and Kinetic. I hava tested it in both single machine and multi machines.

@gavanderhoorn
Copy link

The critical fix is your previous PR (ros/ros_comm#1050), which won't be available on Kinetic until ros/ros_comm#1205 is merged.

@bxwllzz
Copy link

bxwllzz commented Oct 29, 2017

@gavanderhoorn I upgraded ROS about a week ago. That version of ros_comm works in single machine. But It didn't work in multi machines ( I tested this about a week ago). I will test the latest version of ros_comm to check if it work in multi machines or not tomorrow.
PR ros/ros_comm#1050 may works in single machine but it didn't solve the problem completely. It didn't work in multi machines. : (
So I open the new PR ros/ros_comm#1202.

Update 2017/10/30:
The version of ros_comm in my WSL is 1.12.7 (latest). It doesn't work in multi machines.

@k-okada
Copy link

k-okada commented Oct 29, 2017

did you test this with ROS pubs/subs running the a single machine only

yes, that's correct, and I haven't tested with multi machine environment.

@rowanG077
Copy link

ROS Lunar doesn't work for me.

See this issue: #2560

@sunilmut
Copy link
Member

@rowanG077 - Point noted. Let's continue the discussion on #2560.

@Hemofektik
Copy link

Hey everyone. The issue still sometimes appears when using the ROSBridge TCP Server. That happens with ROS Kinetic and Melodic.

[WARN] [1542980586.269629, 1542980567.055869]: Inbound TCP/IP connection failed: timed out [WARN] [1542980586.273813, 1542980567.055869]: Inbound TCP/IP connection failed: connection from sender terminated before handshake header received. 0 bytes were received. Please check sender for additional details.

I cannot exactly understand what the solution was to the initial problem, but it looks like it still exists to some degree in the rosbridge TCP server.

Any ideas where to look into to find the problem?

@janbernloehr
Copy link
Author

Can you share a little more detail on your environment? Windows version? Version of Ubuntu? Version of ros? Version of ros_comm?

@Hemofektik
Copy link

Hemofektik commented Nov 23, 2018

Thanks for your fast reaction.

Windows: 1803
Ubuntu 16.04 and 18.04
ROS: Kinetic and Melodic
ros_comm version that comes with the respective versions of ROS.
Everything runs on a single machine.
In general it works most of the times. but sometimes the socket connection stops receiving/sending data and eventually a few seconds or minutes later the output I posted above is printed to the console. In the end even roswtf is not able to communicate with the rosbridge anymore. Which indicates, that the connection from the ROSBridge to the roscore is broken somehow. Sometimes I need to restart my computer to get it to work again. Restarting the roscore+tcp bridge or the app that communicates does not help most of the times to workaround this issue.

When I run natively in ubuntu 16 or 18 this never happened.

@Hemofektik
Copy link

Any Ideas where I should look to find the culprit?

@bxwllzz
Copy link

bxwllzz commented Nov 26, 2018

@Hemofektik
As I mentioned in the PR ros/ros_comm#1202. If the socket connection is slow in WSL, roscore will try to recv() on a connecting socket. This will cause an error and the socket will be closed early.
Therefore, I suspect that under heavy load, the local socket connection may be slowed down, which may cause rosbridge to fail to connect to roscore.

You can run strace -ttt -f -e trace=network roscore to examine what happened in roscore or rosbridgeand check if it try to recv() on a connecting socket.

@Hemofektik
Copy link

Thanks for your input. I will give it a try.

@mqnc
Copy link

mqnc commented Jun 27, 2019

Hi there! I have just apt-installed ros kinetic on WSL Ubuntu 16.04. The python talker/listener example only works as long as I start both from the same bash. Is that still the issue here? Wasn't it fixed in October 2017?

A bit more on the symptoms:
I can run a roscore in one bash window, then in another I execute

rosrun rospy_tutorial talker.py & rosrun rospy_tutorial listener.py

or

rosrun rospy_tutorial talker.py & rostopic echo /chatter

and everything works as expected. When I run rostopic list from another bash window, it also shows the /chatter topic but neither listener.py nor rostopic echo receive anything.

Any help appreciated!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests