Skip to content

Conversation

pwilkin
Copy link
Collaborator

@pwilkin pwilkin commented Jul 17, 2025

Fix bug per discussion in #14658

@github-actions github-actions bot added the python python script changes label Jul 17, 2025
@pwilkin
Copy link
Collaborator Author

pwilkin commented Jul 17, 2025

@CISC I believe this is what you had in mind :)

@pwilkin pwilkin force-pushed the fix-big-ernie-moe branch from 88611c1 to f6e4931 Compare July 17, 2025 22:22
@pwilkin pwilkin force-pushed the fix-big-ernie-moe branch from f6e4931 to 19eb88c Compare July 17, 2025 22:24
@CISC
Copy link
Collaborator

CISC commented Jul 17, 2025

Yes, however you can remove add_expert_shared_feed_forward_length and change tensor loading in llama-model.cpp, see similar code:

llama.cpp/src/llama-model.cpp

Lines 4784 to 4786 in 760b448

layer.ffn_gate_shexp = create_tensor(tn(LLM_TENSOR_FFN_GATE_SHEXP, "weight", i), {n_embd, n_ff_exp * n_expert_shared}, 0);
layer.ffn_down_shexp = create_tensor(tn(LLM_TENSOR_FFN_DOWN_SHEXP, "weight", i), { n_ff_exp * n_expert_shared, n_embd}, 0);
layer.ffn_up_shexp = create_tensor(tn(LLM_TENSOR_FFN_UP_SHEXP, "weight", i), {n_embd, n_ff_exp * n_expert_shared}, 0);

@CISC
Copy link
Collaborator

CISC commented Jul 17, 2025

Use n_expert_shared as condition for loading them, and remember to init that value here:

llama.cpp/src/llama-model.cpp

Lines 1657 to 1662 in 760b448

if (arch == LLM_ARCH_ERNIE4_5_MOE) {
ml.get_key(LLM_KV_EXPERT_FEED_FORWARD_LENGTH, hparams.n_ff_exp);
ml.get_key(LLM_KV_EXPERT_SHARED_FEED_FORWARD_LENGTH, hparams.n_ff_shexp, false);
ml.get_key(LLM_KV_INTERLEAVE_MOE_LAYER_STEP, hparams.n_moe_layer_step);
ml.get_key(LLM_KV_LEADING_DENSE_BLOCK_COUNT, hparams.n_layer_dense_lead);
}

@CISC
Copy link
Collaborator

CISC commented Jul 17, 2025

Actually, let's not as there are already GGUFs out there. The old calculation is fine as well.

@CISC
Copy link
Collaborator

CISC commented Jul 17, 2025

@nicoboss You will have to reconvert (or delete the ernie4_5-moe.expert_shared_feed_forward_length key).

@CISC CISC merged commit 670e136 into ggml-org:master Jul 17, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

python python script changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants