Assume tied weights if lm_head/output weights is missing. #5824

dmahurin · 2024-03-01T20:20:30Z

This supports model configurations with "tie_word_embeddings", by using the embd_tokens weights if output/lm_head weights are missing (as they will be when weights are tied).

With this change, a tied model like the following can be converted to gguf.
https://huggingface.co/BEE-spoke-data/smol_llama-81M-tied

cebtenzzre · 2024-03-01T21:50:00Z

This change conflicts with the move towards only duplicating the tensors in memory at GGUF load time. See #4978, #5631, #5650, and #5670. I would prefer if we did something similar for Llama.

dmahurin · 2024-03-02T00:16:24Z

@cebtenzzre . Great, I did not see those changes, and this change was intended to be a quick work-around until tied-weights are more properly supported, which sounds like is happening. I will look at those changes.

This is to support model configurations with "tie_word_embeddings" set to true.

dmahurin · 2024-03-02T15:06:41Z

@cebtenzzre, Change updated in llama.cpp, with LLAMA tied-weights now being similar to other tied-weights.

…rganov#5824) This is to support model configurations with "tie_word_embeddings" set to true. Co-authored-by: Don Mahurin <2797413+dmahurin@users.noreply.github.com>

Assume tied weights if lm_head/output weights is missing.

b59615f

This is to support model configurations with "tie_word_embeddings" set to true.

dmahurin force-pushed the tied-weights branch from 8010445 to b59615f Compare March 2, 2024 15:03

ggerganov approved these changes Mar 8, 2024

View reviewed changes

ggerganov merged commit e457fb3 into ggerganov:master Mar 8, 2024
60 checks passed

dmahurin deleted the tied-weights branch March 8, 2024 13:38

cebtenzzre mentioned this pull request Apr 16, 2024

Fix Qwen2-0.5B in convert-hf-to-gguf.py #6578

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Assume tied weights if lm_head/output weights is missing. #5824

Assume tied weights if lm_head/output weights is missing. #5824

dmahurin commented Mar 1, 2024 •

edited

Loading

cebtenzzre commented Mar 1, 2024

dmahurin commented Mar 2, 2024

dmahurin commented Mar 2, 2024

Assume tied weights if lm_head/output weights is missing. #5824

Assume tied weights if lm_head/output weights is missing. #5824

Conversation

dmahurin commented Mar 1, 2024 • edited Loading

cebtenzzre commented Mar 1, 2024

dmahurin commented Mar 2, 2024

dmahurin commented Mar 2, 2024

dmahurin commented Mar 1, 2024 •

edited

Loading