Fix pp convergence to be bitwise #4

3outeille · 2025-11-06T21:40:56Z

Bitwise matching with Qwen2 but not with Llama.

Hypothesis:

Identity() not working with DCP ?
- one easy way to know for sure will be to manually do if toke_embeddings is None in the Transformers modeling and see if the issue persists.
Maybe due to tie_embedding?
- Both models has tied_embedding but when printing model structure, lm_head is showing up for HF llama3 while not for HF Qwen2)

…you use ModuleList vs ModuleDIct

…Dict

3outeille added 2 commits November 5, 2025 16:07

seems like the bug comes from loading weights in PP which differs if …

e52c28a

…you use ModuleList vs ModuleDIct

issue when loading weight due to use of ModuleList. Now use of Module…

b0e6efb

…Dict

3outeille changed the base branch from main to 3outeille/transformers_backend November 6, 2025 21:41

This was referenced Nov 17, 2025

3outeille/transformers backend (Dense model only) pytorch/torchtitan#2048

Open

Add transformers backend (Dense model only) #1

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix pp convergence to be bitwise #4

Fix pp convergence to be bitwise #4

Uh oh!

3outeille commented Nov 6, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fix pp convergence to be bitwise #4

Are you sure you want to change the base?

Fix pp convergence to be bitwise #4

Uh oh!

Conversation

3outeille commented Nov 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

3outeille commented Nov 6, 2025 •

edited

Loading