-
Notifications
You must be signed in to change notification settings - Fork 11.5k
Fix ChatGLMModel for glm-4-9b cannot find tokenizer merges in model file #13058
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
This comment was marked as outdated.
This comment was marked as outdated.
The code now supports GLM variant models, including LLaMA-style and GPT-2-style vocabularies. Tested inference compatibility for the following base models: Current StatusInference works with the above base models.
so llama.cpp update keep track of all EOG tokens in the vocab #9609 , use in stop be that encountered. Function Call Token SupportSpecial token handling for function call-style generation will be submitted in a separate PR. |
…and GPT-2 style tokenizers.
aa501d9
to
1606e81
Compare
…and GPT-2 style tokenizers.
Excellent! I've been using GLM-4-32B and the tool calling format is non-standard (GLM-4 sample code maps to and from their tool calling format which is new-line delimited rather than native json). Are you saying in another PR you'll add the mapping their custom format to the standard json format? |
Function Call Compatibility for GLMThe main modification I made to support function call capabilities in GLM is ensuring that the special token Example usage pattern:
In Loading ChainOnce the model is loaded, the following call chain is involved:
Sampling ChainDuring sampling, the function Refer to the implementation in if (need_insert_eot && format_chat) {
llama_token eot = llama_vocab_eot(vocab);
embd_inp.push_back(eot == LLAMA_TOKEN_NULL ? llama_vocab_eos(vocab) : eot);
need_insert_eot = false;
} if (!embd.empty() && llama_vocab_is_eog(vocab, embd.back()) && !(params.interactive)) {
LOG(" [end of text]\n");
break;
} Regarding Function Call Token ImplementationTo support tokens like for (const auto & t : token_to_id) {
// find EOT token: "<|eot_id|>", "<|im_end|>", "<end_of_turn>", etc.
if (special_eot_id == LLAMA_TOKEN_NULL) {
if (false
|| t.first == "<|eot_id|>"
|| t.first == "<|im_end|>"
|| t.first == "<|end|>"
|| t.first == "<end_of_turn>"
|| t.first == "<|endoftext|>"
|| t.first == "<EOT>"
|| t.first == "_<EOT>"
|| t.first == "<|end▁of▁sentence|>" // DeepSeek
) {
special_eot_id = t.second;
if ((id_to_token[t.second].attr & LLAMA_TOKEN_ATTR_CONTROL) == 0) {
LLAMA_LOG_WARN("%s: control-looking token: %6d '%s' was not control-type; this is probably a bug in the model. its type will be overridden\n",
__func__, t.second, t.first.c_str());
id_to_token[t.second].attr = LLAMA_TOKEN_ATTR_CONTROL;
}
}
}
} |
Fix: Resolved "Cannot find tokenizer merges in model file" Issue
This PR addresses the tokenizer merge issue (
cannot find tokenizer merges in model file
) when loading certain models, especially those converted from HuggingFace. The solution is based on insights from the following discussions and PRs:Verification Steps
1. Build
2. Convert HF Weights
3. Run Inference
./llama-cli -m /mnt/ceph/develop/jiawei/model_checkpoint/glm-4-9b.gguf -ngl 200000 -p "你好啊"
Known Issue
Refer to: #7441
In
llama.cpp
, special tokens (e.g.,eos_token_id
) are currently mapped one-to-one (token → ID). However, in actual transformer models, these tokens might correspond to multiple tokens or require multi-token representations.This mismatch can cause issues where the model doesn't terminate generation correctly.
The exact handling logic and call chain for
special_token
inllama.cpp
remains unclear and might require further investigation.You can see a temporary workaround here. #9606