Replies: 1 comment 2 replies
-
Hmm, I would recommend using one of the newer 1M models instead. That doesn't require you to fiddle with the config and add the additional RoPE for the model. Otherwise you will need to manually edit the Qwen 2.5 model file as well as the hugging face config to get it to support the Yarn scaling. For details on running with long context you can do something like the following:
More notes:
|
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Could you please guide me on how to run the MLX model Qwen2.5 Coder 32B Instruct with a large 128k context window? According to the YaRn (Rope Scaling 4.0) method, the model should support a 128k context window with the following configuration:
"rope_scaling": {
"factor": 4.0,
"original_max_position_embeddings": 32768,
"type": "yarn"
}
I would like to clarify whether it is possible to run this model (Hugging Face link) with 128k token support using the mlx-lm framework.
Or does the mlx-lm framework only support a 32k token context window?
Thank you for your help!
Beta Was this translation helpful? Give feedback.
All reactions