-
Notifications
You must be signed in to change notification settings - Fork 485
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
2 changed files
with
44 additions
and
2 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,43 @@ | ||
# Omni-Math | ||
|
||
[Omni-Math](https://huggingface.co/datasets/KbsdJames/Omni-MATH) contains 4428 competition-level problems. These problems are meticulously categorized into 33 (and potentially more) sub-domains and span across 10 distinct difficulty levels, enabling a nuanced analysis of model performance across various mathematical disciplines and levels of complexity. | ||
|
||
* Project Page: https://omni-math.github.io/ | ||
* Github Repo: https://github.com/KbsdJames/Omni-MATH | ||
* Omni-Judge (opensource evaluator of this dataset): https://huggingface.co/KbsdJames/Omni-Judge | ||
|
||
## Omni-Judge | ||
|
||
> Omni-Judge is an open-source mathematical evaluation model designed to assess whether a solution generated by a model is correct given a problem and a standard answer. | ||
You should deploy the omni-judge server like: | ||
```bash | ||
set -x | ||
|
||
lmdeploy serve api_server KbsdJames/Omni-Judge --server-port 8000 \ | ||
--tp 1 \ | ||
--cache-max-entry-count 0.9 \ | ||
--log-level INFO | ||
``` | ||
|
||
and set the server url in opencompass config file: | ||
|
||
```python | ||
from mmengine.config import read_base | ||
|
||
with read_base(): | ||
from opencompass.configs.datasets.omni_math.omni_math_gen import omni_math_datasets | ||
|
||
|
||
omni_math_dataset = omni_math_datasets[0] | ||
omni_math_dataset['eval_cfg']['evaluator'].update( | ||
url=['http://172.30.8.45:8000', | ||
'http://172.30.16.113:8000'], | ||
) | ||
``` | ||
|
||
## Performance | ||
|
||
| llama-3_1-8b-instruct | qwen-2_5-7b-instruct | InternLM3-8b-Instruct | | ||
| -- | -- | -- | | ||
| 15.18 | 29.97 | 32.75 | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters