You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
OpenCompass is a one-stop platform for large model evaluation, aiming to provide a fair, open, and reproducible benchmark for large model evaluation. Its main features includes:
Comprehensive support for models and datasets: Pre-support for 20+ HuggingFace and API models, a model evaluation scheme of 50+ datasets with about 300,000 questions, comprehensively evaluating the capabilities of the models in five dimensions.
Efficient distributed evaluation: One line command to implement task division and distributed evaluation, completing the full evaluation of billion-scale models in just a few hours.
Diversified evaluation paradigms: Support for zero-shot, few-shot, and chain-of-thought evaluations, combined with standard or dialogue type prompt templates, to easily stimulate the maximum performance of various models.
Modular design with high extensibility: Want to add new models or datasets, customize an advanced task division strategy, or even support a new cluster management system? Everything about OpenCompass can be easily expanded!
Experiment management and reporting mechanism: Use config files to fully record each experiment, support real-time reporting of results.
We would like to support the evaluation of open_llama with opencompass. If you have any ideas or suggestions, feel free to raise an issue or contact us with opencompass@pjlab.org.cn
The text was updated successfully, but these errors were encountered:
Feel free to evaluate the model and publish the results! The pytorch checkpoint is fully compatible with huggingface transformers so you should be able to run that directly.
Hi, thanks for the great works.
We are opencompass team(https://github.com/internLM/OpenCompass/), and focus on LLM evalaution.
OpenCompass is a one-stop platform for large model evaluation, aiming to provide a fair, open, and reproducible benchmark for large model evaluation. Its main features includes:
Comprehensive support for models and datasets: Pre-support for 20+ HuggingFace and API models, a model evaluation scheme of 50+ datasets with about 300,000 questions, comprehensively evaluating the capabilities of the models in five dimensions.
Efficient distributed evaluation: One line command to implement task division and distributed evaluation, completing the full evaluation of billion-scale models in just a few hours.
Diversified evaluation paradigms: Support for zero-shot, few-shot, and chain-of-thought evaluations, combined with standard or dialogue type prompt templates, to easily stimulate the maximum performance of various models.
Modular design with high extensibility: Want to add new models or datasets, customize an advanced task division strategy, or even support a new cluster management system? Everything about OpenCompass can be easily expanded!
Experiment management and reporting mechanism: Use config files to fully record each experiment, support real-time reporting of results.
We would like to support the evaluation of open_llama with opencompass. If you have any ideas or suggestions, feel free to raise an issue or contact us with opencompass@pjlab.org.cn
The text was updated successfully, but these errors were encountered: