main to based #3

sedrick-keh-tri · 2024-03-24T09:12:05Z

No description provided.

* use `@ray.remote` with distributed vLLM * update versions * bugfix * unpin vllm * fix pre-commit * added version assertion error * Revert "added version assertion error" This reverts commit 8041e9b. * added version assertion for DP * expand DP note * add warning * nit * pin vllm * fix typos

…ity (EleutherAI#1487) * setting trust_remote_code * dataset list no notebooks * respect trust remote code * Address changes, move cli options and change datasets * fix task for tests * headqa * remove kobest * pin datasets and address comments * clean up space

* add french-bench * rename arc easy * linting * update datasets for no remote code exec * fix string delimiter * add info to readmr * trim trailing whitespace * add detailed groups * add info to readme * remove orangesum title from fbench main * Force PPL tasks to be 0-shot --------- Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

* Fix padding * Fix elif in model loading * format

* Add new tasks of GPQA * Add README * Remove unused functions * Remove unused functions * Linters * Add flexible match * update * Remove deplicate function * Linter * update * Update lm_eval/filters/extraction.py Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * register multi_choice_regex * Update * run precommit --------- Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> Co-authored-by: haileyschoelkopf <hailey@eleuther.ai>

* Start adding eq-bench * Start adding to yaml and utils * Get metric working * Add README * Handle cases where answer is not parseable * Deal with unparseable answers and add percent_parseable metric * Update README

* init wmdp yaml file * Add WMDP Multiple-choice * fix linter issues * Delete lm_eval/tasks/wmdp/_wmdp.yaml --------- Co-authored-by: Lintang Sutawika <lintang@sutawika.com>

)

…used by cot which hardcodes fewshot prompt (EleutherAI#1502)

…eutherAI#1533) * Remove unused `decontamination_ngrams_path` and all mentions (still no alternative path provided) * Fix improper import of LM and usage of evaluator in one of scripts * update type hints in instance and task api * raising errors in task.py instead of asserts * Fix warnings from ruff * raising errors in __main__.py instead of asserts * raising errors in tasks/__init__.py instead of asserts * raising errors in evaluator.py instead of asserts * evaluator: update type hints and remove unused variables in code * Update lm_eval/__main__.py Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * Update lm_eval/__main__.py Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * Update lm_eval/api/task.py Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * Update lm_eval/api/task.py Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * Update lm_eval/api/task.py Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * Update lm_eval/evaluator.py Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * pre-commit induced fixes --------- Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

…g document and, update wandb_args description (EleutherAI#1536) * Update openai completions and docs/CONTRIBUTING.md * Update wandb args description * Update docs/interface.md --------- Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

* Add compatibility for vLLM's new Logprob object * Fix * Update lm_eval/models/vllm_causallms.py * fix format? * trailing whitespace --------- Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

…leutherAI#1551) * update gen_kwargs in code2-text-go.yaml * update gen_kwargs in rest code2-text

* Support jinja templating for "description" * Update task_guide.md * Update lm_eval/api/task.py * fix format? * whitespace errors * fix whitespace * fix bad variable reference --------- Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> Co-authored-by: haileyschoelkopf <hailey@eleuther.ai>

* add Arabic EXAMS benchmark * fixed the linter issue, and add more information on the readme * Update README.md --------- Co-authored-by: Lintang Sutawika <lintang@sutawika.com>

* add agieval * fix typo * add cloze / math exactmatch agieval tasks, rename * update exact-match agieval tasks, allow for multiple-correct answers * add more detail to readme * don't parse_math_answer twice --------- Co-authored-by: Alex Bäuerle <alex@a13x.io>

…ng the checkpoint.

…tps://github.com/TRI-ML/kiji/blob/main/kiji/eval/openlm/utils_openlm.py

swde and fda

* Update IFEval dataset to official one This PR updates the IFEval dataset to the one hosted under the Google org: https://huggingface.co/datasets/google/IFEval Note the main change is an updated prompt from this commit in the GitHub repo: google-research/google-research@26d8ccd * Update ifeval.yaml --------- Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

* multiple chat template support * help doc update * add transformers link to docstring * model args update * comment update * statement simplification * simplified chat_template property * docs update * removed template arg from HFLM class * interface doc update * model guide update * interface doc update * reuse apply_chat_template variable * model guide refactor * interface doc update * removed old definition * last nits * last nits * last nits * better wording * last nits * Remove unnecessary Optional * Apply suggestions from code review Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * return variable rename --------- Co-authored-by: Clémentine Fourrier <22726840+clefourrier@users.noreply.github.com> Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

* fix: arguments data * fix based on comment * Update zeno_visualize.py updated all output types --------- Co-authored-by: Baber Abbasi <92168766+baberabb@users.noreply.github.com>

* mela * Update mela_en.yaml * Create _mela.yaml --------- Co-authored-by: Lintang Sutawika <lintang@eleuther.ai>

* fix the regex string in yaml file * Update samplers.py --------- Co-authored-by: Lintang Sutawika <lintang@eleuther.ai>

ACLUE bibtex typo reported to ACL Anthology and fixed here as title in pdf is correct.

* Created DUP eval code for gsm8k * asdiv * Fixed fewshot=8 issue * added results to .gitignore * reverted unnecessary changes and moved results + gsm8k_dup out of repo to prepare for pull req * fixed whitespace and unintentional hardcoded version change information * created mbpp task * Reverted changes re. mbpp to save for a future Pull req * reverted metrics.py to previous commit * updated asdiv readme to include informaiton about new asdiv_cot_llama task * Apply suggestions from code review --------- Co-authored-by: Alexander Detkov <alexander.d.detkov@gmail.com> Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

* chat template hotfix * pre-commit

…#2258) * Update evaluator.py * update error msg

) * fix revision type * allow for None-input loglikelihood reqs to be cached * handle no remaining cache items * pre-commit * change cache_hook.add_partial(loglikelihood_rolling...) convention --------- Co-authored-by: Baber Abbasi <baber@eleuther.ai>

* max_length - 1 (generation always >= 1) * vllm: fix rolling prefix_token * nit: add comment * fixup! max_length should be handled for logliklihoods

* max_length - 1 (generation always >= 1) * vllm: fix rolling prefix_token * nit: add comment * fixup! max_length should be handled for logliklihoods * Revert "fixup! max_length should be handled for logliklihoods" This reverts commit 432d1a3.

* default chat template method fix * move chat_template to TemplateLM * remove hotfix * handle openai `chat_template` * Update lm_eval/api/model.py Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * add 'max_tokens' to gen_kwargs * pre-commit --------- Co-authored-by: KonradSzafer <szafer.konrad@gmail.com> Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

…leutherAI#2232) * arabic leaferboard yaml file is added * arabic toxigen is implemented * Dataset library is imported * arabic sciq is added * util file of arabic toxigen is updated * arabic race is added * arabic piqa is implemented * arabic open qa is added * arabic copa is implemented * arabic boolq ia added * arabic arc easy is added * arabic arc challenge is added * arabic exams benchmark is implemented * arabic hellaswag is added * arabic leaderboard yaml file metrics are updated * arabic mmlu benchmarks are added * arabic mmlu group yaml file is updated * alghafa benchmarks are added * acva benchmarks are added * acva utils.py is updated * light version of arabic leaderboard benchmarks are added * bugs fixed * bug fixed * bug fixed * bug fixed * bug fixed * bug fixed * library import bug is fixed * doc to target updated * bash file is deleted * results folder is deleted * leaderboard groups are added * full arabic leaderboard groups are added, plus some bug fixes to the light version * Create README.md README.md for arabic_leaderboard_complete * Create README.md README.md for arabic_leaderboard_light * Delete lm_eval/tasks/arabic_leaderboard directory * Update README.md * Update README.md adding the Arabic leaderboards to the library * Update README.md 10% of the training set * Update README.md 10% of the training set * revert .gitignore to prev version * Update lm_eval/tasks/README.md Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * updated main README.md * Update lm_eval/tasks/README.md * specify machine translated benchmarks (complete) * specify machine translated benchmarks (light version) * add alghafa to the related task names (complete and light) * add 'acva' to the related task names (complete and light) * add 'arabic_leaderboard' to all the groups (complete and light) * all dataset - not a random sample * added more accurate details to the readme file * added mt_mmlu from okapi * Update lm_eval/tasks/README.md Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * Update lm_eval/tasks/README.md Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * updated mt_mmlu readme * renaming 'alghafa' full and light * renaming 'arabic_mmlu' light and full * renaming 'acva' full and light * update readme and standardize dir/file names * running pre-commit --------- Co-authored-by: shahrzads <sayehban@ualberta.ca> Co-authored-by: shahrzads <56282669+shahrzads@users.noreply.github.com> Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

* add WIP hf vlm class * add doc_to_image * add mmmu tasks * fix merge conflicts * add lintang's changes to hf_vlms.py * fix doc_to_image * added yaml_path for config-loading * revert * add line to process str type v * update * modeling cleanup * add aggregation for mmmu * rewrite MMMU processing code based on only MMMU authors' repo (doc_to_image still WIP) * implemented doc_to_image * update doc_to_image to accept list of features * update functions * readd image processed * update args process * bugfix for repeated images fed to model * push WIP loglikelihood code * commit most recent code (generative ; qwen2-vl testing) * preliminary image_token_id handling * small mmmu update: some qs have >4 mcqa options * push updated modeling code * use processor.apply_chat_template * add mathvista draft * nit * nit * ensure no footguns in text<>multimodal LM<>task incompatibility * add notification to readme regarding launch of prototype! * fix compatibility check * reorganize mmmu configs * chat_template=None * add interleave chat_template * add condition * add max_images; interleave=true * nit * testmini_mcq * nit * pass image string; convert img * add vllm * add init * vlm add multi attr * fixup * pass max images to vllm model init * nit * encoding to device * fix HFMultimodalLM.chat_template ? * add mmmu readme * remove erroneous prints * use HFMultimodalLM.chat_template ; restore tasks/__init__.py * add docstring for replace_placeholders in utils * fix `replace_placeholders`; set image_string=None * fix typo * cleanup + fix merge conflicts * update MMMU readme * del mathvista * add some sample scores * Update README.md * add log msg for image_string value --------- Co-authored-by: haileyschoelkopf <hailey@eleuther.ai> Co-authored-by: Baber Abbasi <baber@eleuther.ai> Co-authored-by: Baber <baber@hey.com> Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

* Update README.md I encounter some Git buffer size limits when trying to download all commits history of the repository, such as: ```error: RPC failed; curl 18 transfer closed with outstanding read data remaining error: 5815 bytes of body are still expected fetch-pack: unexpected disconnect while reading sideband packet fatal: early EOF``` therefore the installation is faster and there are not errors when I download only the last version of the repository * Fix linting issue

* feat(neuron): align with latest optimum-neuron * feat(neuron): support pre-exported neuron models * fix(neuron): correctly use max_length * fix(neuron): adapt loglikelihood The evaluation of log likelihood was not working for neuron models using continuous batching, such as all cached neuron LLama models. * refactor(neuron): remove dead code

…kpoints

baberabb and others added 30 commits March 3, 2024 07:35

Cleaning up unused unit tests (EleutherAI#1516)

4eba9cf

Hotfix: fix TypeError in --trust_remote_code (EleutherAI#1517)

4582391

Fix minor edge cases (EleutherAI#951 EleutherAI#1503) (EleutherAI#1520)

292e581

* Fix padding * Fix elif in model loading * format

Openllm benchmark (EleutherAI#1526)

8a875e9

Add EQ-Bench as per EleutherAI#1459 (EleutherAI#1511)

c5acce0

* Start adding eq-bench * Start adding to yaml and utils * Get metric working * Add README * Handle cases where answer is not parseable * Deal with unparseable answers and add percent_parseable metric * Update README

Add WMDP Multiple-choice (EleutherAI#1534)

29b2b01

* init wmdp yaml file * Add WMDP Multiple-choice * fix linter issues * Delete lm_eval/tasks/wmdp/_wmdp.yaml --------- Co-authored-by: Lintang Sutawika <lintang@sutawika.com>

Adding new task : KorMedMCQA (EleutherAI#1530)

faee1ad

Update docs on LM.loglikelihood_rolling abstract method (EleutherAI#1532

525b8f5

)

update printed num-fewshot ; prevent fewshots from erroneously being …

0270505

…used by cot which hardcodes fewshot prompt (EleutherAI#1502)

Fix incorrect max_gen_toks generation kwarg default in code2_text. (E…

f518228

…leutherAI#1551) * update gen_kwargs in code2-text-go.yaml * update gen_kwargs in rest code2-text

Update generate_until_template_yaml (EleutherAI#1546)

a79a7c3

Update ifeval.yaml (EleutherAI#1506)

282b9e7

add Arabic EXAMS benchmark (EleutherAI#1498)

4ab0759

* add Arabic EXAMS benchmark * fixed the linter issue, and add more information on the readme * Update README.md --------- Co-authored-by: Lintang Sutawika <lintang@sutawika.com>

added open_lm to lm-evaluation-harness

026af93

added readme.md and a bug fix.

d00e914

Now, using the loaded config for both generating the config and loadi…

9ac9f4d

…ng the checkpoint.

bypassing load_model call from openlm.main. Doing similar stuff to ht…

264fc71

…tps://github.com/TRI-ML/kiji/blob/main/kiji/eval/openlm/utils_openlm.py

reverting back to using load_model.

341d42b

Update README.md

09f20f7

swde and fda

3255caf

Merge pull request #1 from TRI-ML/based3

c400b35

swde and fda

lewtun and others added 30 commits September 25, 2024 09:37

fix the leaderboard doc to reflect the tasks (EleutherAI#2219)

e9d6010

Update CODEOWNERS (EleutherAI#2229)

04b5026

Fix Zeno Visualizer (EleutherAI#2227)

ba58a70

* fix: arguments data * fix based on comment * Update zeno_visualize.py updated all output types --------- Co-authored-by: Baber Abbasi <92168766+baberabb@users.noreply.github.com>

mela (EleutherAI#1970)

6d4c0a8

* mela * Update mela_en.yaml * Create _mela.yaml --------- Co-authored-by: Lintang Sutawika <lintang@eleuther.ai>

fix the regex string in mmlu_pro template (EleutherAI#2238)

e4aaaf9

* fix the regex string in yaml file * Update samplers.py --------- Co-authored-by: Lintang Sutawika <lintang@eleuther.ai>

Fix logging when resizing embedding layer in peft mode (EleutherAI#2239)

7bfdb51

computer_science --> "computer science" (EleutherAI#2241)

d35ccf1

Fix typos in multiple places (EleutherAI#2244)

e47557e

ACLUE bibtex typo reported to ACL Anthology and fixed here as title in pdf is correct.

fix group args of mmlu and mmlu_pro (EleutherAI#2245)

d452d58

chat template hotfix (EleutherAI#2250)

2c15e42

* chat template hotfix * pre-commit

[Draft] More descriptive simple_evaluate() LM TypeError (EleutherAI…

c2ec6ba

…#2258) * Update evaluator.py * update error msg

update nltk version to require 3.9.1 (EleutherAI#2259)

3462b6d

API: fix maxlen; vllm: prefix_token_id bug (EleutherAI#2262)

3c4fd26

* max_length - 1 (generation always >= 1) * vllm: fix rolling prefix_token * nit: add comment * fixup! max_length should be handled for logliklihoods

hotfix EleutherAI#2262 (EleutherAI#2264)

0e7daed

* max_length - 1 (generation always >= 1) * vllm: fix rolling prefix_token * nit: add comment * fixup! max_length should be handled for logliklihoods * Revert "fixup! max_length should be handled for logliklihoods" This reverts commit 432d1a3.

Bump version to v0.4.4 ; Fixes to TMMLUplus (EleutherAI#2280)

c360ad4

repr bug (EleutherAI#2315)

fd69de5

Fixed dummy model (EleutherAI#2339)

82c81e3

add a note for missing dependencies (EleutherAI#2336)

5085bdb

fix problem with equal sign in file name and add support for mbm chec…

d3ea541

…kpoints

fix weight loading

98ac6a1

fix error introduced in prismatic.py

4401bc4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

main to based #3

main to based #3

sedrick-keh-tri commented Mar 24, 2024

main to based #3

Are you sure you want to change the base?

main to based #3

Conversation

sedrick-keh-tri commented Mar 24, 2024