You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Evaluating large models (> 30B parameters) is hard, especially with limited hardware. Especially when there are many metrics to be evaluated, it can significantly increase the time needed for the strong machine to be running. For example when I want to evaluate a 70B model on a large dataset and then compute many LLM-judge metrics, it can occupy a 4xA100 machine for days, incurring significant cost. The GPUs are only actually active during the first few hours for inference. Afterwards they are just sitting idle.
Solution/Feature
Therefore, ideally, we would like to just run inference with one metric and save the results to the details files. Then in a second step we load the responses from the details files and just run the metrics. For that we can use a significantly smaller machine. Loading from the details files is being added in PR #488. However, to evaluate the metrics, we still need to load the entire model into memory, defeating the purpose. Loading the model only right before running it would alleviate this issue.
Possible alternatives
Alternatively, we could mock the model.
The text was updated successfully, but these errors were encountered:
Issue encountered
Evaluating large models (> 30B parameters) is hard, especially with limited hardware. Especially when there are many metrics to be evaluated, it can significantly increase the time needed for the strong machine to be running. For example when I want to evaluate a 70B model on a large dataset and then compute many LLM-judge metrics, it can occupy a 4xA100 machine for days, incurring significant cost. The GPUs are only actually active during the first few hours for inference. Afterwards they are just sitting idle.
Solution/Feature
Therefore, ideally, we would like to just run inference with one metric and save the results to the details files. Then in a second step we load the responses from the details files and just run the metrics. For that we can use a significantly smaller machine. Loading from the details files is being added in PR #488. However, to evaluate the metrics, we still need to load the entire model into memory, defeating the purpose. Loading the model only right before running it would alleviate this issue.
Possible alternatives
Alternatively, we could mock the model.
The text was updated successfully, but these errors were encountered: