-
-
Notifications
You must be signed in to change notification settings - Fork 720
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add asyncio.streams Comm #2165
Add asyncio.streams Comm #2165
Conversation
infos = [socket.getaddrinfo(host, port, family=family, flags=flags) for | ||
host in hosts] | ||
infos = set(itertools.chain.from_iterable(infos)) | ||
infos = [info for info in infos if info[1] == socket.SocketKind.SOCK_STREAM] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed the first two of these lines to use the socket
blocking API rather than asyncio's non-blocking API.
The third line is a hack. I added it to get past errors like the following:
def _start_serving(self):
if self._serving:
return
self._serving = True
for sock in self._sockets:
> sock.listen(self._backlog)
E OSError: [Errno 95] Operation not supported
I suspect that there is a way to handle this by passing in the right arguments to start_server
, but I haven't found it yet.
This does get us down to about a 600us roundtrip time, down from 800us, which is a bit of a win. In [1]: from dask.distributed import Client
...: client = Client()
...: client
...:
...:
Out[1]: <Client: scheduler='tcp://127.0.0.1:35289' processes=4 cores=4>
In [2]: async def f():
...: for i in range(10000):
...: await client.scheduler.identity()
...: %time client.sync(f)
...:
...:
CPU times: user 5.8 s, sys: 467 ms, total: 6.27 s
Wall time: 6.31 s |
Oooh, I can get this down to 470us if
At this point about 70% of scheduler time is spent in socket.send (see https://stackoverflow.com/questions/51731690/why-does-asyncio-spend-time-in-socket-senddata) |
OK, this could use review by someone who knows this stack better than I do. This currently implements a functional There are a few challenges:
@pitrou if you have time to look things over here I would appreciate it. |
Current tests on the CI systems aren't that meaningful. This only works in Python 3.7 due to copying over and modifying asyncio code. |
Oh, and I'm also somewhat blocked on this problem: https://stackoverflow.com/questions/51731690/why-does-asyncio-spend-time-in-socket-senddata A very simple benchmark that sends a small message back and forth currently gets up to spending about 70% of its time in |
I also suggest you run a benchmark with large messages, because |
Just for the record, which profiler are you using? And how much |
Sampling profiler. The same one that we use for worker threads. See #2144
I agree. My short-term plan is to include both, but have tornado be default. I'm dealing with some workloads now that need relatively low-latency communication of many small messages. Using this and many other tricks I can get round-trip message latency down to about 450us (see #2156)
That's a good question. I'll find out. |
This is pretty broken, but I thought I'd push it up in the spirit of showing work.
@@ -0,0 +1,186 @@ | |||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This stuff is probably no longer necessary. Since doing this work we've made Listeners optionaly awaitable, so we should be able to handle things appropriately on the Dask side.
This only works with Python 3.7. I had to pull out and modify some asyncio code. It also makes no attempt at TLS.