Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Autobahn WebSocket server #426

Merged
merged 16 commits into from
Sep 15, 2019
Merged

Autobahn WebSocket server #426

merged 16 commits into from
Sep 15, 2019

Conversation

mvollrath
Copy link
Contributor

Instead of using obscure threading hackery to add flow control to Tornado, use a grown-up WebSocket server with its own flow control features.

@mvollrath mvollrath requested review from jihoonl and viktorku August 12, 2019 21:46
@mvollrath
Copy link
Contributor Author

CI pending ros/rosdistro#21999

@uahic
Copy link

uahic commented Aug 14, 2019

Since Autobahn seems also to have CBOR compression and all of that, do we need my pull request to add the support for compression in service calls then?

@mvollrath
Copy link
Contributor Author

Since Autobahn seems also to have CBOR compression and all of that, do we need my pull request to add the support for compression in service calls then?

This should not affect anything that happens in rosbridge_library, including protocol-level payload compression/encoding.

Copy link
Member

@viktorku viktorku left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work! Left mostly minor comments.

Is it too much work to add some tests that produce significant load such that we see how the autobahn server handles closer-to-real-world scenarios? Like publishing bigger point clouds and/or messages at faster rates. There might be some subtle differences in the shift from Tornado to this Twisted-based WS server so it would be good to make sure this doesn't introduce some regressions.

@@ -15,9 +15,7 @@
<maintainer email="jihoonlee.in@gmail.com">Jihoon Lee</maintainer>

<buildtool_depend>catkin</buildtool_depend>
<exec_depend>python-backports.ssl-match-hostname</exec_depend>
<exec_depend>python-tornado</exec_depend>
<exec_depend>python-twisted-core</exec_depend>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Twisted is still directly used, I think it should stay as a non-transitive dependency.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, fixed.

try:
reactor.stop()
except ReactorNotRunning:
pass
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps some informative log message could go here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

application.listen(port, address)
rospy.loginfo("Rosbridge WebSocket server started on port %d", port)
listenWS(factory, context_factory)
rospy.loginfo('listening at {}'.format(uri))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we revert the original log format (to be clear it's the rosbridge server listening)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was done.

self._valve.set()

def stopProducing(self):
rospy.loginfo('stop')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More descriptive logging needed here as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, this was left here accidentally.

class RosbridgeWebSocket(WebSocketHandler):
@implementer(interfaces.IPushProducer)
class OutgoingValve:
"""Allows the Autobahn transport to pause outgoing messages from rosbridge."""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this detect back-pressure and pauses publishing internally?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, added some more description for this.

Copy link

@dhananjaysathe dhananjaysathe Sep 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it a good idea to block a publish because of back pressure by default ? If i have a misbehaving client i may optionally want to drop messages and not make all clients suffer. The caller of this code is blocking in nature afaik

Copy link
Contributor Author

@mvollrath mvollrath Sep 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a good point that there is potential for one slow client to block writing to others. I consider it a flaw in the rosbridge protocol that topics can be subscribed with (and by default) no protocol-level per-client queue, which would effectively decouple clients from the rospy.Subscriber thread. I do think connecting the backpressure by default is good, because it prevents one slow client from running the server out of memory, which in the context of a robotics framework is terrifying.

Until the protocol is fixed, there are two options:

  • Slow client runs server out of memory.
  • Slow client blocks other clients subscribed to the same topics.

A third option (really just a mitigation) is to, when subscribing to topics, always set queue_length > 0 to decouple clients from each other. In this case you have to trust the client to not use the default queue_length = 0, but may solve most of your problems as long as you control the application.

@mvollrath
Copy link
Contributor Author

CI now pending ros/rosdistro#22032 because I forgot to add xenial. Melodic tests are green.

I've done my own anecdotal testing on a couple of different use cases (point clouds and a very busy graph of small messages across many clients) which led to some fixes. I may add some smoke tests for the server as a first step in this PR, mostly to make sure ssl works as configured.

@jihoonl
Copy link
Member

jihoonl commented Aug 27, 2019

It seems like a huge ABI change which makes me concern about backward compatibility. Tornado has been too long to remove immediately.

Instead of replacing, how about making autobahn websocket server as addition to tornado? Then, I would add deprecation sign in tornado server that guides users to use autobahn instead.

This reverts commit 4224671.

This create_url method doesn't exist yet in Ubuntu kinetic.
Avoid network collisions and unnecessary configuration while testing.
@mvollrath
Copy link
Contributor Author

It seems like a huge ABI change which makes me concern about backward compatibility. Tornado has been too long to remove immediately.

Instead of replacing, how about making autobahn websocket server as addition to tornado? Then, I would add deprecation sign in tornado server that guides users to use autobahn instead.

I've made the Autobahn server the default and only script entrypoint. The old Tornado handler is still there if anybody was using that interface. The Autobahn server script supports the same features as the Tornado server (and a few more).

@mvollrath mvollrath requested a review from viktorku August 31, 2019 01:31
@dhananjaysathe
Copy link

+1 to this PR @jihoonl @mvollrath there are some more issues with the tornado based implementations that needed quite some to track down. Certain clients lik android phones with agressive power saving enabled leasve the websocket in a suspended state hat then chokes up all other clients for messages from the topic thread.
Using this branch and setting pings fixes this as it detects the issue
eg i set : websocket_ping_interval = 5 websocket_ping_timeout=2
i see the log on the server doing as expected:
2019-09-12 08:46:50+0000 [-] dropping connection to peer x.x.x.x with abort=True: WebSocket ping timeout (peer did not respond with pong in time)
this ping pong does not work reliablity in the tornado version either (or maybe the roslibjs missed something)
Great stuff. Do you folks need any help getting this PR in ?

@mvollrath
Copy link
Contributor Author

@dhananjaysathe thanks for your feedback, good to know this is already fixing things for people. 👍

Send 100 large messages in each direction to saturate buffers.
@mvollrath
Copy link
Contributor Author

@viktorku I've made the smoke test run 100 messages of 10,000 byte strings in each direction, which always causes backpressure when I test it here (a protocol queue_length of 1 always drops some messages).

@jihoonl I've preserved the Tornado ABI for compatibility.

@jihoonl jihoonl merged commit 5a0efdc into develop Sep 15, 2019
@jihoonl
Copy link
Member

jihoonl commented Sep 15, 2019

FYI @jspricke @mgrrx, A big change has been merged into develop branch. You may want to check if it breaks your usecase.

@jihoonl
Copy link
Member

jihoonl commented Sep 16, 2019

and thanks @mvollrath for nice updates.

@mgrrx
Copy link

mgrrx commented Sep 16, 2019

@jihoonl thanks for the heads-up. We'll test it in the coming days.

@dirk-thomas
Copy link
Contributor

I would be great of this patch would also be ported to the ros2 branch so that it can benefit from this. I took a look at the diff but am not familiar enough with this to provide the PR myself.

@wolfv
Copy link

wolfv commented Oct 22, 2019

thanks, this is great. the tornado used by rosbridge was getting old and uncompatible with newer tornado versions as used in Jupyter, and websockets were silently failing 👍

@moriarty
Copy link

It seems like a huge ABI change which makes me concern about backward compatibility. Tornado has been too long to remove immediately.
Instead of replacing, how about making autobahn websocket server as addition to tornado? Then, I would add deprecation sign in tornado server that guides users to use autobahn instead.

I've made the Autobahn server the default and only script entrypoint. The old Tornado handler is still there if anybody was using that interface. The Autobahn server script supports the same features as the Tornado server (and a few more).

It would have been great if this breaking change had waited to go into Noetic, or defaulted the other way around to not introduce the breaking change which was warned about above.

This breaks compatibility with https://github.com/rctoris/jrosbridge
Admittedly it's breaking things which were under specified because javax.websockets isn't setting the scheme in the header Origin.

https://github.com/crossbario/autobahn-python/blob/13949d383af593fb4a78ed21079742c8234d67b1/autobahn/websocket/protocol.py#L96

Adding another proxy between the rosbridge and code that was previously working which adds the protocol to the Origin in the header gets around the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants