Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FT] Enable lazy model initialization #496

Open
JoelNiklaus opened this issue Jan 11, 2025 · 0 comments
Open

[FT] Enable lazy model initialization #496

JoelNiklaus opened this issue Jan 11, 2025 · 0 comments
Labels
feature request New feature/request

Comments

@JoelNiklaus
Copy link
Contributor

Issue encountered

Evaluating large models (> 30B parameters) is hard, especially with limited hardware. Especially when there are many metrics to be evaluated, it can significantly increase the time needed for the strong machine to be running. For example when I want to evaluate a 70B model on a large dataset and then compute many LLM-judge metrics, it can occupy a 4xA100 machine for days, incurring significant cost. The GPUs are only actually active during the first few hours for inference. Afterwards they are just sitting idle.

Solution/Feature

Therefore, ideally, we would like to just run inference with one metric and save the results to the details files. Then in a second step we load the responses from the details files and just run the metrics. For that we can use a significantly smaller machine. Loading from the details files is being added in PR #488. However, to evaluate the metrics, we still need to load the entire model into memory, defeating the purpose. Loading the model only right before running it would alleviate this issue.

Possible alternatives

Alternatively, we could mock the model.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature/request
Projects
None yet
Development

No branches or pull requests

1 participant