Add dataset collection #253

Yunnglin · 2024-12-18T10:04:39Z

注册数据集：
- race
- trivia_qa 不支持service
- truthful_qa
- mmlu
- humaneval
- general_qa
- cmmlu
- arc
- hellaswag 不支持 service
- bbh
- ceval
- gsm8k
- competition_math
支持指定URL模型服务评测
支持数据混合评测，自定义collection schema

schema = CollectionSchema(name='math&reasoning', datasets=[
            CollectionSchema(name='math', datasets=[
                    DatasetInfo(name='gsm8k', weight=1, task_type='math', tags=['en', 'math']),
                    DatasetInfo(name='competition_math', weight=1, task_type='math', tags=['en', 'math']),
                    DatasetInfo(name='cmmlu', weight=2, task_type='math', tags=['zh', 'math'], args={'subset_list': ['college_mathematics', 'high_school_mathematics']}),
                    DatasetInfo(name='ceval', weight=3, task_type='math', tags=['zh', 'math'], args={'subset_list': ['advanced_mathematics', 'high_school_mathematics', 'discrete_mathematics', 'middle_school_mathematics']}),
            ]),
            CollectionSchema(name='reasoning', datasets=[
                    DatasetInfo(name='arc', weight=1, task_type='reasoning', tags=['en', 'reasoning']),
                    DatasetInfo(name='ceval', weight=1, task_type='reasoning', tags=['zh', 'reasoning'], args={'subset_list': ['logic']}),
                    DatasetInfo(name='race', weight=1, task_type='reasoning', tags=['en', 'reasoning']),
            ]),
        ])



task_cfg = TaskConfig(
    model='qwen2.5',
    api_url='http://127.0.0.1:8801/v1/chat/completions',
    api_key='EMPTY',
    eval_type=EvalType.SERVICE,
    datasets=['data_collection'],
    dataset_args={'data_collection': {
        'local_path': 'outputs/mixed_data_test.jsonl'
    }},
)
run_task(task_cfg=task_cfg)

输出：

task_type	dataset_name	subset_name	average_score	count
math	ceval	advanced_mathematics	0.25	12
math	ceval	discrete_mathematics	0.333333	3
math	ceval	high_school_mathematics	0	3
math	ceval	middle_school_mathematics	0	3
math	cmmlu	college_mathematics	0.2	5
math	cmmlu	high_school_mathematics	0.555556	9
math	competition_math	default	0	7
math	gsm8k	main	0.428571	7
reasoning	arc	ARC-Challenge	0.166667	6
reasoning	arc	ARC-Easy	0.5	10
reasoning	ceval	logic	0.25	16
reasoning	race	high	0.285714	14
reasoning	race	middle	0.8	5

Yunnglin added 15 commits December 18, 2024 18:03

add dataset register

f3c09da

fix circular import

db7f37c

fix lint

1168796

Merge branch 'main' into feat/collection

376afc8

update data adapter

655d49c

update model adapter

2f941a6

split model adapter

a3b9b9f

add server

85b6577

update seed and ceval

78ce442

init collection

4b07449

add collection and sampler

f67322a

remove output

ebcc800

add mix evaluator

0c4e87d

add evaluator

b957f83

register all data

95aa741

Yunnglin changed the title ~~[WIP] Add dataset collection~~ Add dataset collection Dec 24, 2024

update test

1ea478c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add dataset collection #253

Add dataset collection #253

Yunnglin commented Dec 18, 2024 •

edited

Loading

Add dataset collection #253

Are you sure you want to change the base?

Add dataset collection #253

Conversation

Yunnglin commented Dec 18, 2024 • edited Loading

Yunnglin commented Dec 18, 2024 •

edited

Loading