Context parallel implementation for Mamba 2 #597

josiahbjorgaard · 2024-10-18T20:20:58Z

I have a working context parallel implementation forked from this repo for forward/backward passes which required two modifications

padding conv layer input chunks on each GPU with the last N_padding tokens of the previous GPU and then discarding padding token output indices
transferring final states in state passing point-to-point between GPUs sequentially

And then vice-a-versa for the backward pass. I believe I've also worked out a way to do this without sequential point-to-point.

Would this be useful to contribute? If so, would like to know best way to do so since it requires modification of the core wrapper of the mamba 2 triton code.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Context parallel implementation for Mamba 2 #597

Context parallel implementation for Mamba 2 #597

josiahbjorgaard commented Oct 18, 2024

Context parallel implementation for Mamba 2 #597

Context parallel implementation for Mamba 2 #597

Comments

josiahbjorgaard commented Oct 18, 2024