-
-
Notifications
You must be signed in to change notification settings - Fork 719
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize transitions #4451
Optimize transitions #4451
Conversation
89a84e5
to
7d2a080
Compare
389bd83
to
7c02a74
Compare
ed70ecf
to
be4e152
Compare
def send(self, msg): | ||
def send(self, *msgs): | ||
"""Schedule a message for sending to the other side | ||
|
||
This completes quickly and synchronously | ||
""" | ||
if self.comm is not None and self.comm.closed(): | ||
raise CommClosedError | ||
|
||
self.message_count += 1 | ||
self.buffer.append(msg) | ||
self.message_count += len(msgs) | ||
self.buffer.extend(msgs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, this is cleaner than I expected :)
If you have time I encourage you to try out viztracer to see the results. I would be happy to help walk you through first use of it if you want to get together some time tomorrow. |
Yeah I did run with cProfile last night and looked at the call graph. Though that was before the changes to Trying to root out a remaining bug. Probably just a small typo I'm overlooking. |
be4e152
to
152c1fa
Compare
I think that it's worth adding viztracer to your bag of tricks. I think that you especially might really find value from it. |
Neither of these statements should raise a `KeyError`. So just drop this `try...except...`.
This avoids building a `list`, which makes it easier for Cython to optimize.
This should simplify the C code generated by Cython to unpack the `tuple` as it no longer needs to check if it is a `list` or some other sequence that needs to be unpacked and can simply use the `tuple` unpacking logic.
This allows us to batch all worker and client sends into a single function.
53d5363
to
cdcfcf2
Compare
I suspect that you've lost history of the intense part of the shuffle. viztracer only captures the last N events, so if you don't shut things down relatively quickly you miss the fun part of the computation. I'm guessing this because of the focus on transitions to the forgotten state. Also, when selecting the region of interest, I recommend aiming low so that you don't get the events like |
Yeah I'll admit I'm not that familiar with this tool. Also was only able to view the results in Chrome (no other browser worked). It was also pretty slow to navigate through. Would encourage others to play with this if they have something in mind that they would like to see. |
If you're interested I'd be very happy to give you a brief tour. I think that pairing briefly here would be worth the time. |
Ok I gave this one last try. Though honestly I don't plan to spend more time with this tool (am just finding to be too slow; it was causing my computer to freeze) Am seeing |
Here's the call graph I get from this change. This can be compared to the recent nightly benchmark ( quasiben/dask-scheduler-performance#98 ). This seems to cut a decent chunk of time |
Ok one last time with viztracer (still struggling with sluggish UI). Just killed the Scheduler roughly 20-30s into working. Guessing that is in the middleish somewhat near the end, which also lines up with the logging messages seen there and some of the transitions popping up here. Seeing On the reading point since that came up here, there may be changes that can be made in Tornado that would help. In particular some code changes to rely on asyncio for Also time is being spent deserializing messages. Though this goes hand-in-hand with reading and is not something addressed in this PR (though maybe we can look at that after we merge this PR). |
@jakirkham asked me to run this PR and compare with latest in master with Py-Spy with the following code: ddf_h = timeseries(start='2000-01-01', end='2000-02-01', partition_freq='5min')
ddf_h = ddf_h.map_partitions(lambda df: df.head(0))
ddf_h = ddf_h.persist()
print(ddf_h)
_ = wait(ddf_h)
result = shuffle(ddf_h, "id", shuffle="tasks")
ddf = client.persist(result)
_ = wait(ddf) Total tasks: 648270 Latest (2021-01-29)
This PR
|
To summarize the above results, we have moved a good chunk of time out of |
Planning on merging tomorrow if no comments |
@jakirkham happened to saw this thread about sluggish viztracer :) I'm aware that with default size of circular buffer, if you fill all the buffer, you will experience performance issue if your RAM is not large enough. One thing to solve this is to reduce the buffer size with However, the latest viztracer now supports perfetto, which has a much better performance than Chrome Trace Viewer, which was developed years ago and was already deprecated. You can use viztracer with perfetto by
This will automatically open your browser for the report. Or you can log to a json file and open it with
I'm sorry that viztracer did not perform as you expected, but if you wanted to give it another try, let me know how the latest UI works :) |
Closes #4454
Requires #4452(merged! 🎉)This more thoroughly optimizes the higher-level
transition
andtransitions
functions. Does this by going through and annotating the variables used. Also avoids contains checks when it is possible to retrieve with a fallback (like withdict.get(...)
). Tries to remove any unneeded copies where possible.Also this collects all messages to send to workers and clients from transitions and waits to send them until the end of the transition where it lumps multiple messages together.
Note: Still need to move communication calls from
transition
totransitions
.