[FT] Enable lazy model initialization #496

JoelNiklaus · 2025-01-11T21:11:03Z

Issue encountered

Evaluating large models (> 30B parameters) is hard, especially with limited hardware. Especially when there are many metrics to be evaluated, it can significantly increase the time needed for the strong machine to be running. For example when I want to evaluate a 70B model on a large dataset and then compute many LLM-judge metrics, it can occupy a 4xA100 machine for days, incurring significant cost. The GPUs are only actually active during the first few hours for inference. Afterwards they are just sitting idle.

Solution/Feature

Therefore, ideally, we would like to just run inference with one metric and save the results to the details files. Then in a second step we load the responses from the details files and just run the metrics. For that we can use a significantly smaller machine. Loading from the details files is being added in PR #488. However, to evaluate the metrics, we still need to load the entire model into memory, defeating the purpose. Loading the model only right before running it would alleviate this issue.

Possible alternatives

Alternatively, we could mock the model.

JoelNiklaus added the feature request New feature/request label Jan 11, 2025

JoelNiklaus mentioned this issue Jan 11, 2025

Initial proposal for model lazy loading #497

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FT] Enable lazy model initialization #496

[FT] Enable lazy model initialization #496

JoelNiklaus commented Jan 11, 2025

[FT] Enable lazy model initialization #496

[FT] Enable lazy model initialization #496

Comments

JoelNiklaus commented Jan 11, 2025

Issue encountered

Solution/Feature

Possible alternatives