Does the Current Implementation Break Full Attention When block_length < gen_length?

Thank you for your excellent work. I would like to confirm my understanding of the current design:

According to my understanding, the current library seems to be suitable only for dLLMs that natively support block attention. For dLLMs using full attention (e.g. LLaDA), if block_length < gen_length, then during the generation of the current block, the model cannot access information from the subsequent blocks in the prompt. In this case, does the attention mechanism no longer satisfy the requirements of full attention?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Does the Current Implementation Break Full Attention When block_length < gen_length? #9

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Does the Current Implementation Break Full Attention When block_length < gen_length? #9

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions