Skip to content

Conversation

@Huaijin2005
Copy link

Problem

It seems that when batch size > 1 ( see dream_generate.py ), fast-dllm Dream meets an exception:

image

This is caused by wrong kv-cache replacing in modeling_dream.py ( LLaDA's version is correct).

After Handling batched replace_position correctly, it still doesn't work, because the shape of attention mask mismatches with Q and K's shapes.

image

By modifying generation_utils_block.py , now it works well.

Solution

See changes in modeling_dream.py:

image

and changes in generation_utils_block.py:

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant