Skip to content

Conversation

@3outeille
Copy link
Member

@3outeille 3outeille commented Nov 6, 2025

Bitwise matching with Qwen2 but not with Llama.

Hypothesis:

  • Identity() not working with DCP ?
    • one easy way to know for sure will be to manually do if toke_embeddings is None in the Transformers modeling and see if the issue persists.
  • Maybe due to tie_embedding?
    • Both models has tied_embedding but when printing model structure, lm_head is showing up for HF llama3 while not for HF Qwen2)
image

@3outeille 3outeille changed the base branch from main to 3outeille/transformers_backend November 6, 2025 21:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants