Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distributed mlx_lm.evaluate #1174

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

Distributed mlx_lm.evaluate #1174

wants to merge 3 commits into from

Conversation

barronalex
Copy link
Collaborator

@barronalex barronalex commented Dec 19, 2024

Add a distributed version of mlx_lm.evaluate that runs on multiple nodes and produces identical outputs.

Also fix a few bugs:

  • Add masking so that changing the batch_size no longer affects the output
  • Fixed a bug in loglikelihood_rolling tasks, e.g. wiki text
mlx_lm.evaluate --model mlx-community/Qwen2.5-7B-Instruct-bf16 --tasks winogrande

On 1 M2 Ultra:

Acc:   0.6992896606156275
Time (post init): 64 sec 

On 4 M2 Ultra:

Acc:   0.6985003946329913
Time (post init): 16 sec 

@ivanfioravanti
Copy link
Contributor

This is great! I'm testing it with M2 Ultra + 2 M4 Max. WOW! Great job @barronalex
When will this be reviewed and merged?

llms/mlx_lm/evaluate.py Outdated Show resolved Hide resolved
llms/mlx_lm/evaluate.py Outdated Show resolved Hide resolved
llms/mlx_lm/evaluate.py Outdated Show resolved Hide resolved
llms/mlx_lm/evaluate.py Outdated Show resolved Hide resolved
llms/mlx_lm/evaluate.py Outdated Show resolved Hide resolved
llms/mlx_lm/evaluate.py Outdated Show resolved Hide resolved
llms/mlx_lm/evaluate.py Outdated Show resolved Hide resolved
@ivanfioravanti
Copy link
Contributor

ivanfioravanti commented Jan 22, 2025

Any news on this PR? It would be great to speed up some distributed evals on DeepSeek R1 😜
I will give it a try.

@barronalex
Copy link
Collaborator Author

That’s awesome! I’ll get it in later today.

@barronalex
Copy link
Collaborator Author

@awni thanks for the comments! I think this is good to merge now.

@@ -346,11 +361,8 @@ def main():
)
parser.add_argument(
"--apply-chat-template",
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was impossible to disable this before, so I've changed it to be off by default (which mirrors the lm_eval behavior)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about defaulting it to off for instruct models.. it seems like you would always want this on for most models that are used regularly? Does it make sense to change this to --ignore-chat-template instead to be able to shut it off if needed?

Copy link
Member

@awni awni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Just one comment. Let me know what you think. Otherwise LGTM!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants