Skip to content

Commit

Permalink
upload README
Browse files Browse the repository at this point in the history
  • Loading branch information
jnanliu committed Jan 20, 2025
1 parent b1552f7 commit 4ad5d74
Show file tree
Hide file tree
Showing 2 changed files with 44 additions and 2 deletions.
43 changes: 43 additions & 0 deletions opencompass/configs/datasets/omni_math/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
# Omni-Math

[Omni-Math](https://huggingface.co/datasets/KbsdJames/Omni-MATH) contains 4428 competition-level problems. These problems are meticulously categorized into 33 (and potentially more) sub-domains and span across 10 distinct difficulty levels, enabling a nuanced analysis of model performance across various mathematical disciplines and levels of complexity.

* Project Page: https://omni-math.github.io/
* Github Repo: https://github.com/KbsdJames/Omni-MATH
* Omni-Judge (opensource evaluator of this dataset): https://huggingface.co/KbsdJames/Omni-Judge

## Omni-Judge

> Omni-Judge is an open-source mathematical evaluation model designed to assess whether a solution generated by a model is correct given a problem and a standard answer.
You should deploy the omni-judge server like:
```bash
set -x

lmdeploy serve api_server KbsdJames/Omni-Judge --server-port 8000 \
--tp 1 \
--cache-max-entry-count 0.9 \
--log-level INFO
```

and set the server url in opencompass config file:

```python
from mmengine.config import read_base

with read_base():
from opencompass.configs.datasets.omni_math.omni_math_gen import omni_math_datasets


omni_math_dataset = omni_math_datasets[0]
omni_math_dataset['eval_cfg']['evaluator'].update(
url=['http://172.30.8.45:8000',
'http://172.30.16.113:8000'],
)
```

## Performance

| llama-3_1-8b-instruct | qwen-2_5-7b-instruct | InternLM3-8b-Instruct |
| -- | -- | -- |
| 15.18 | 29.97 | 32.75 |
3 changes: 1 addition & 2 deletions opencompass/datasets/omni_math.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,6 @@
from datasets import load_dataset
from transformers import AutoTokenizer

from opencompass.models import OpenAISDK
from opencompass.models.turbomind_api import TurboMindAPIModel
from opencompass.openicl.icl_evaluator import BaseEvaluator
from opencompass.registry import ICL_EVALUATORS, LOAD_DATASET, MODELS
Expand Down Expand Up @@ -47,7 +46,7 @@ def __init__(self, url):
self.tokenizer = AutoTokenizer.from_pretrained('KbsdJames/Omni-Judge',
trust_remote_code=True)

def batch_infer(self, models: List[OpenAISDK],
def batch_infer(self, models: List[TurboMindAPIModel],
inputs: List[str]) -> List[str]:
batch_num = len(models)
batch_size = (len(inputs) + batch_num - 1) // batch_num
Expand Down

0 comments on commit 4ad5d74

Please sign in to comment.