Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds Global MLMU #426

Merged
merged 6 commits into from
Dec 9, 2024
Merged

Adds Global MLMU #426

merged 6 commits into from
Dec 9, 2024

Conversation

hynky1999
Copy link
Collaborator

@hynky1999 hynky1999 commented Dec 6, 2024

Supports 3 version (ca (cult. agnostic), cs (cult sensitive), all (everything)):
Example how to run:
lighteval vllm pretrained="Qwen/Qwen2.5-7B,dtype=bfloat16,max_model_length=4096,pairwise_tokenization=True,tensor_parallel_size=1,gpu_memory_utilisation=0.6" --custom-tasks lighteval.tasks.multilingual.tasks --max-samples 100 --output-dir ./results "lighteval|global_mmlu_all_ces_cf|3|1"

Results for Qwen2.5-7B:

|                                 Task                                 |Version|    Metric    |Value |   |Stderr|
|----------------------------------------------------------------------|------:|--------------|-----:|---|-----:|
|all                                                                   |       |acc_norm_token|0.3775|±  |0.0479|
|                                                                      |       |acc_norm      |0.3811|±  |0.0480|
|                                                                      |       |acc_norm_pmi  |0.3905|±  |0.0483|

Examples of prompts (ces):

-- sample 7 --
{ 'choices': [' 0', ' 10 %', ' 25 %', ' 50 %'],
  'ctx': '',
  'fewshot_sorting_class': None,
  'gold_index': [2],
  'instruction': '',
  'num_asked_few_shots': -1,
  'num_effective_few_shots': -1,
  'original_query': '',
  'query': 'Otázka: Uvažujme segment o délce 10. Body A a B jsou vybrány '
           'náhodně tak, že A a B rozdělují segment na tři menší segmenty. '
           'Jaká je pravděpodobnost, že tři menší segmenty mohou tvořit strany '
           'trojúhelníku?\n'
           'Odpověď:',
  'specific': None,
  'task_name': 'lighteval|global_mmlu_all_ces_cf:college_mathematics',
  'unconditioned_query': 'Odpověď:'}
-- sample 8 --
{ 'choices': [' Žádný', ' Jen já', ' Pouze II', ' Pouze III'],
  'ctx': '',
  'fewshot_sorting_class': None,
  'gold_index': [2],
  'instruction': '',
  'num_asked_few_shots': -1,
  'num_effective_few_shots': -1,
  'original_query': '',
  'query': 'Otázka: Nechť V je konečnorozměrný reálný vektorový prostor a '
           'nechť P je lineární transformace V tak, že P^2 = P. Která z '
           'následujících musí platit?\n'
           'I. P je invertibilní.\n'
           'II. P je diagonální.\n'
           'III. P je buď transformace identity nebo nulová transformace.\n'
           'Odpověď:',
  'specific': None,
  'task_name': 'lighteval|global_mmlu_all_ces_cf:college_mathematics',
  'unconditioned_query': 'Odpověď:'}
-- sample 9 --
{ 'choices': [ ' Každý kompaktní prostor je kompletní',
               ' Každý kompletní prostor je kompaktní',
               ' Ani (a) ani (b).',
               ' Oba (a) i (b).'],
  'ctx': '',
  'fewshot_sorting_class': None,
  'gold_index': [0],
  'instruction': '',
  'num_asked_few_shots': -1,
  'num_effective_few_shots': -1,
  'original_query': '',
  'query': 'Otázka: Která z následujících skutečností je pravdivá?\nOdpověď:',
  'specific': None,
  'task_name': 'lighteval|global_mmlu_all_ces_cf:college_mathematics',
  'unconditioned_query': 'Odpověď:'}

cc @clefourrier, as you had opinion on subset

@hynky1999 hynky1999 requested a review from clefourrier December 6, 2024 13:29
@HuggingFaceDocBuilderDev
Copy link
Collaborator

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@clefourrier
Copy link
Member

Hi! It could also make sense to support "Unannotated" (which is all - CS- CA)

@hynky1999
Copy link
Collaborator Author

I don't think that's particulary intersting for anyone.
If you don't care about labels: use all
If you only want CA: use CA
if you only want CS: use CS

I don't see a reason in suporting unannotated

@clefourrier
Copy link
Member

Comparative analysis, to compare un-annotated to CS or CA and therefore get a better idea of which of the subsets it's closer to

@clefourrier
Copy link
Member

It's a very minor change to add no?

@hynky1999
Copy link
Collaborator Author

hynky1999 commented Dec 9, 2024

Comparative analysis, to compare un-annotated to CS or CA and therefore get a better idea of which of the subsets it's closer to
Fair enough, added "UNK" subset.

@clefourrier
Copy link
Member

Just one nit in the doc, if it works for you merge it and then you can squash and merge

Co-authored-by: Clémentine Fourrier <22726840+clefourrier@users.noreply.github.com>
@hynky1999 hynky1999 merged commit 412ccfc into main Dec 9, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants