fix: fast-dllm dream batch decoding #57

Huaijin2005 · 2025-12-24T17:29:17Z

Problem

It seems that when batch size > 1 ( see dream_generate.py ), fast-dllm Dream meets an exception:

This is caused by wrong kv-cache replacing in modeling_dream.py ( LLaDA's version is correct).

After Handling batched replace_position correctly, it still doesn't work, because the shape of attention mask mismatches with Q and K's shapes.

By modifying generation_utils_block.py , now it works well.

See changes in modeling_dream.py:

and changes in generation_utils_block.py:

…ask in Dream for batch decoding

fix: handle batched replace_position correctly and adjust attention m…

6a3fe32

…ask in Dream for batch decoding