GPU HLO memory scheduler postprocess modification #15

jfurtek · 2022-10-06T22:18:30Z

jfurtek
Oct 6, 2022

We have noticed that, in some cases, NCCL kernels launched by HLO asynchronous collective operations (i.e. AllReduce, which is lowered to AllReduceStart and AllReduceDone) are "exposed" - there are no concurrent compute kernels executing, even though the HLO instruction dependencies and available GPU resources indicate that they could.

We have an alternative implementation of the memory schedule postprocessor, which is currently used to "move" AllReduceStart and AllReduceDone instructions. Some notes on the new implementation and some preliminary (anecdotal) improvement results are attached.

Since changes to the GPU scheduler will potentially have a broad impact, would it be best to generate a pull request with this scheduler enabled via an opt-in flag? Or should the PR just replace the existing postprocessor?
XLA_sched_notes.pdf

Kariddi · 2022-10-10T22:07:19Z

Kariddi
Oct 10, 2022

Hello Jeremy,
I'm inclined to accept your PR (assuming nobody else has anything against it) with it behind a flag (i think that's very important to reduce the impact of it on existing workloads in case of bugs) and then we can discuss about turning the flag on by default.

At Google we have some other schedulers internally for our internal targets that do a similar thing and include some heuristics to expose more overlap opportunities and also a memory pressure tracking system to avoid memory pressure to go out of hand (re-ordering instructions can cause memory pressure to increase in unexpected ways). Considering you need something very similar here we could opensource this pass eventually to also serve the opensource targets. It will require a bit of refactoring to make sure its ready for opensource. In this way you could also have this other option that then you can evaluate for your workloads as well, but it might require (as I mentioned) a bit more time to get it to opensource.

WDYT? Does this plan sound good to you.

1 reply

jfurtek Oct 12, 2022
Author

Sounds good - I will put together a pull request.

jfurtek · 2022-12-01T19:32:43Z

jfurtek
Dec 1, 2022
Author

PR is here:
tensorflow/tensorflow#58756

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU HLO memory scheduler postprocess modification #15

{{title}}

Replies: 2 comments 1 reply

{{title}}

{{title}}

{{title}}

Select a reply

GPU HLO memory scheduler postprocess modification #15

jfurtek Oct 6, 2022

Replies: 2 comments · 1 reply

Kariddi Oct 10, 2022

jfurtek Oct 12, 2022 Author

jfurtek Dec 1, 2022 Author

jfurtek
Oct 6, 2022

Replies: 2 comments 1 reply

Kariddi
Oct 10, 2022

jfurtek Oct 12, 2022
Author

jfurtek
Dec 1, 2022
Author