Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

main to based #3

Open
wants to merge 282 commits into
base: based-fork-2
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
282 commits
Select commit Hold shift + click to select a range
e5e35fc
Vllm update DP+TP (#1508)
baberabb Mar 3, 2024
9516792
Setting trust_remote_code to True for HuggingFace datasets compatibil…
veekaybee Mar 3, 2024
4eba9cf
Cleaning up unused unit tests (#1516)
veekaybee Mar 4, 2024
48476c4
French Bench (#1500)
ManuelFay Mar 4, 2024
4582391
Hotfix: fix TypeError in `--trust_remote_code` (#1517)
haileyschoelkopf Mar 4, 2024
292e581
Fix minor edge cases (#951 #1503) (#1520)
haileyschoelkopf Mar 4, 2024
8a875e9
Openllm benchmark (#1526)
baberabb Mar 5, 2024
01108ac
Add a new task GPQA (the part CoT and generative) (#1482)
uanu2002 Mar 5, 2024
c5acce0
Add EQ-Bench as per #1459 (#1511)
pbevan1 Mar 6, 2024
29b2b01
Add WMDP Multiple-choice (#1534)
justinphan3110 Mar 6, 2024
faee1ad
Adding new task : KorMedMCQA (#1530)
sean0042 Mar 6, 2024
525b8f5
Update docs on LM.loglikelihood_rolling abstract method (#1532)
haileyschoelkopf Mar 6, 2024
0270505
update printed num-fewshot ; prevent fewshots from erroneously being …
haileyschoelkopf Mar 6, 2024
4ee1b38
Cleanup and fixes (Task, Instance, and a little bit of *evaluate) (#1…
LSinev Mar 6, 2024
9e6e240
Update installation commands in openai_completions.py and contributin…
naem1023 Mar 6, 2024
8051d95
Add compatibility for vLLM's new Logprob object (#1549)
Yard1 Mar 9, 2024
f518228
Fix incorrect `max_gen_toks` generation kwarg default in code2_text. …
cosmo3769 Mar 9, 2024
3bdf25e
Support jinja templating for task descriptions (#1553)
HishamAlyahya Mar 10, 2024
a79a7c3
Update generate_until_template_yaml (#1546)
haileyschoelkopf Mar 11, 2024
282b9e7
Update ifeval.yaml (#1506)
haileyschoelkopf Mar 11, 2024
4ab0759
add Arabic EXAMS benchmark (#1498)
khalil-Hennara Mar 11, 2024
a3e56af
AGIEval (#1359)
haileyschoelkopf Mar 11, 2024
026af93
added open_lm to lm-evaluation-harness
kushalarora Mar 12, 2024
d00e914
added readme.md and a bug fix.
kushalarora Mar 13, 2024
9ac9f4d
Now, using the loaded config for both generating the config and loadi…
kushalarora Mar 13, 2024
264fc71
bypassing load_model call from openlm.main. Doing similar stuff to ht…
kushalarora Mar 13, 2024
341d42b
reverting back to using load_model.
kushalarora Mar 13, 2024
09f20f7
Update README.md
kushal-tri Mar 13, 2024
3255caf
swde and fda
sedrick-keh-tri Mar 15, 2024
c400b35
Merge pull request #1 from TRI-ML/based3
sedrick-keh-tri Mar 18, 2024
dd58917
added mamba_open_lm model and some cleanup for model open_lm
kushalarora Mar 18, 2024
d6df578
Merge branch 'main' of github.com:TRI-ML/lm-evaluation-harness into main
kushalarora Mar 18, 2024
eb916e8
mamba does not support passing stopping criteria
kushalarora Mar 18, 2024
1f0642e
always load strict with mamba to avoid silent failures.
kushalarora Mar 21, 2024
dbe1247
revert eos
sedrick-keh-tri Mar 24, 2024
3ac2ea1
qasper use full text
sedrick-keh-tri Mar 25, 2024
948e284
Support evaluating llms of prismatic using lm-evaluation-harness
kushalarora Sep 25, 2024
15281cd
cli_evaluate calls simple_evaluate with the same verbosity. (#1563)
Wongboo Mar 12, 2024
fb3f6f2
add manual tqdm disabling management (#1569)
artemorloff Mar 13, 2024
29ad0cf
Fix README section on vllm integration (#1579)
eitanturok Mar 15, 2024
1e87196
Fix Jinja template for Advanced AI Risk (#1587)
RylanSchaeffer Mar 15, 2024
c07809c
Proposed approach for testing CLI arg parsing (#1566)
veekaybee Mar 17, 2024
271fefe
Patch for Seq2Seq Model predictions (#1584)
lintangsutawika Mar 17, 2024
6005fba
Add start date in results.json (#1592)
djstrong Mar 17, 2024
39a209c
Cleanup for v0.4.2 release (#1573)
haileyschoelkopf Mar 18, 2024
728b1e2
Fix eval_logger import for mmlu/_generate_configs.py (#1593)
noufmitla Mar 18, 2024
f5c6716
use BOS token in loglikelihood (#1588)
djstrong Mar 18, 2024
23a9c6c
Revert "Patch for Seq2Seq Model predictions (#1584)" (#1601)
haileyschoelkopf Mar 19, 2024
b03101d
fix gen_kwargs arg reading (#1607)
artemorloff Mar 19, 2024
90383ed
fix until arg processing (#1608)
artemorloff Mar 19, 2024
8506c86
Fixes to Loglikelihood prefix token / VLLM (#1611)
haileyschoelkopf Mar 20, 2024
f70915f
Add ACLUE task (#1614)
haonan-li Mar 21, 2024
77645c4
OpenAI Completions -- fix passing of unexpected 'until' arg (#1612)
haileyschoelkopf Mar 21, 2024
70294ce
add logging of model args (#1619)
baberabb Mar 22, 2024
0855de2
Add vLLM FAQs to README (#1625) (#1633)
haileyschoelkopf Mar 25, 2024
bf716df
peft Version Assertion (#1635)
LameloBally Mar 25, 2024
26f51e6
Seq2seq fix (#1604)
lintangsutawika Mar 25, 2024
4c8f7ef
Integration of NeMo models into LM Evaluation Harness library (#1598)
sergiopperez Mar 26, 2024
908b2e9
Fix conditional import for Nemo LM class (#1641)
haileyschoelkopf Mar 27, 2024
915c3bb
Fix SuperGlue's ReCoRD task following regression in v0.4 refactoring …
orsharir Mar 28, 2024
4fcdfa3
Add Latxa paper evaluation tasks for Basque (#1654)
juletx Apr 1, 2024
230f401
Fix CLI --batch_size arg for openai-completions/local-completions (#1…
mgoin Apr 1, 2024
43bd629
Patch QQP prompt (#1661)
haileyschoelkopf Apr 4, 2024
e944723
TMMLU+ implementation (#1394)
ZoneTwelve Apr 5, 2024
cf609da
Anthropic Chat API (#1594)
tryumanshow Apr 5, 2024
74a0fa1
correction bug EleutherAI#1664 (#1670)
nicho2 Apr 7, 2024
180a8d0
Update README.md (#1680)
haileyschoelkopf Apr 8, 2024
d6330fa
Add delta weights model loading (#1712)
KonradSzafer Apr 16, 2024
101651f
Add `neuralmagic` models for `sparseml` and `deepsparse` (#1674)
mgoin Apr 16, 2024
4e0af6d
fix error when appending eot_token_id for generate_until tasks (#1699)
sergiopperez Apr 18, 2024
2307356
Adding retries and rate limit to toxicity tasks (#1620)
sator-labs Apr 18, 2024
b0da760
reference `--tasks list` in README (#1726)
nairbv Apr 25, 2024
d3ac5f7
Add XNLIeu: a dataset for cross-lingual NLI in Basque (#1694)
juletx Apr 25, 2024
5d6ac02
Fix Parameter Propagation for Tasks that have `include` (#1749)
lintangsutawika Apr 25, 2024
bc8188c
Support individual scrolls datasets (#1740)
giorgossideris Apr 26, 2024
4c4cc06
Add filter registry decorator (#1750)
lozhn Apr 26, 2024
e2119b7
remove duplicated `num_fewshot: 0` (#1769)
chujiezheng May 1, 2024
5e2a90c
Pile 10k new task (#1758)
mukobi May 1, 2024
2645858
Fix m_arc choices (#1760)
jordane95 May 1, 2024
b6a3d2d
upload new tasks (#1728)
simran-arora May 1, 2024
be51527
vllm lora support (#1756)
bcicc May 2, 2024
b145b86
Add option to set OpenVINO config (#1730)
helena-intel May 2, 2024
4b69f2d
evaluation tracker implementation (#1766)
KonradSzafer May 3, 2024
6d3d0fd
eval tracker args fix (#1777)
KonradSzafer May 3, 2024
42aa051
limit fix (#1785)
KonradSzafer May 5, 2024
a2d89d6
remove echo parameter in OpenAI completions API (#1779)
djstrong May 5, 2024
d91f70a
Fix README: change`----hf_hub_log_args` to `--hf_hub_log_args` (#1776)
MuhammadBinUsman03 May 5, 2024
a063756
Fix bug in setting until kwarg in openai completions (#1784)
ciaranby May 5, 2024
75b6eb6
Provide ability for custom sampler for ConfigurableTask (#1616)
LSinev May 6, 2024
5f3a646
Update `--tasks list` option in interface documentation (#1792)
sepiatone May 6, 2024
e41a239
Fix Caching Tests ; Remove `pretrained=gpt2` default (#1775)
haileyschoelkopf May 7, 2024
d338ef7
link to the example output on the hub (#1798)
KonradSzafer May 7, 2024
fbef196
Re-add Hendrycks MATH (no sympy checking, no Minerva hardcoded prompt…
haileyschoelkopf May 7, 2024
b22e749
Logging Updates (Alphabetize table printouts, fix eval tracker bug) (…
haileyschoelkopf May 7, 2024
f350605
Initial integration of the Unitxt to LM eval harness (#1615)
yoavkatz May 7, 2024
93eeaed
add task for mmlu evaluation in arc multiple choice format (#1745)
jonabur May 8, 2024
4b737c7
Update flag `--hf_hub_log_args` in interface documentation (#1806)
sepiatone May 8, 2024
f2e4811
Copal task (#1803)
Erland366 May 9, 2024
0012446
Adding tinyBenchmarks datasets (#1545)
LucWeber May 13, 2024
91f0d64
interface doc update (#1807)
KonradSzafer May 13, 2024
ed83a98
Fix links in README guiding to another branch (#1838)
LSinev May 14, 2024
4ac9131
Fix: support PEFT/LoRA with added tokens (#1828)
mapmeld May 19, 2024
8f1cee8
fixed incorrect check for task type (replace `~` with `not`) (#1865)
zafstojano May 21, 2024
9b04f7f
fixed docs typos (#1863)
zafstojano May 21, 2024
6b03438
Update polemo2_out.yaml (#1871)
zhabuye May 22, 2024
4dbccfc
Unpin vllm in dependencies (#1874)
edgan8 May 23, 2024
45568d3
Fix outdated links to the latest links in `docs` (#1876)
oneonlee May 24, 2024
96b8b75
[HFLM]Use Accelerate's API to reduce hard-coded CUDA code (#1880)
statelesshz May 24, 2024
9ad8cb6
Fix `batch_size=auto` for HF Seq2Seq models (#1765) (#1790)
haileyschoelkopf May 24, 2024
2caac90
Fix Brier Score (#1847)
lintangsutawika May 24, 2024
dd52f0d
Fix for bootstrap_iters = 0 case (#1715) (#1789)
haileyschoelkopf May 24, 2024
f7eaa39
add mmlu tasks from pile-t5 (#1710)
lintangsutawika May 24, 2024
e528a92
Bigbench fix (#1686)
lintangsutawika May 24, 2024
fb99844
Rename `lm_eval.logging -> lm_eval.loggers` (#1858)
haileyschoelkopf May 26, 2024
f4ee0de
Updated vllm imports in vllm_causallms.py (#1890)
mgoin May 28, 2024
081d43c
[HFLM]Add support for Ascend NPU (#1886)
statelesshz May 30, 2024
144df21
`higher_is_better` tickers in output table (#1893)
zafstojano May 30, 2024
2309812
Add dataset card when pushing to HF hub (#1898)
KonradSzafer May 31, 2024
63a301c
Making hardcoded few shots compatible with the chat template mechanis…
clefourrier May 31, 2024
380773f
Try to make existing tests run little bit faster (#1905)
LSinev May 31, 2024
022afbf
Fix fewshot seed only set when overriding num_fewshot (#1914)
LSinev Jun 3, 2024
e4d2c68
Complete task list from pr 1727 (#1901)
anthony-dipofi Jun 3, 2024
96563f5
Add chat template (#1873)
KonradSzafer Jun 3, 2024
756b278
Multiple Choice Questions and Large Languages Models: A Case Study wi…
maximegmd Jun 5, 2024
1e65308
Modify pre-commit hook to check merge conflicts accidentally committe…
LSinev Jun 5, 2024
6e0f5e4
[add] fld logical formula task (#1931)
MorishT Jun 6, 2024
a33f941
Add new Lambada translations (#1897)
zafstojano Jun 6, 2024
e04ed0c
Implement NoticIA (#1912)
ikergarcia1996 Jun 6, 2024
7e7125b
Add The Arabic version of the PICA benchmark (#1917)
khalil-Hennara Jun 7, 2024
7c87afb
Update siqa.yaml (#1909)
haileyschoelkopf Jun 7, 2024
f1b3bdd
Update basque-glue (#1913)
zhabuye Jun 7, 2024
515cc47
Test output table layout consistency (#1916)
zafstojano Jun 7, 2024
fa4268d
Update __main__.py (#1939)
sadra-barikbin Jun 9, 2024
a12f8be
Add the Arabic version with refactor to Arabic pica to be in alghafa …
khalil-Hennara Jun 10, 2024
868667e
Results filenames handling fix (#1926)
KonradSzafer Jun 11, 2024
403e533
Remove AMMLU Due to Translation (#1948)
haileyschoelkopf Jun 11, 2024
89bff66
add include_defaults kwarg to taskmanager, add tests for include_path…
haileyschoelkopf Jun 11, 2024
cf8ef90
add hacky add_bos_token forcing for Gemma to VLLM too (#1857)
haileyschoelkopf Jun 11, 2024
5be7d30
Update interface.md (#1955)
sadra-barikbin Jun 12, 2024
a8f6d0c
Fix self.max_tokens in anthropic_llms.py (#1848)
lozhn Jun 12, 2024
051eadd
`samples` is newline delimited (#1930)
baberabb Jun 13, 2024
70644eb
Fix `--gen_kwargs` and VLLM (`temperature` not respected) (#1800)
haileyschoelkopf Jun 13, 2024
e81e133
make write_out.py explicitly error if no splits match (#1796)
haileyschoelkopf Jun 13, 2024
efe38fb
fix: add directory filter to os.walk to ignore 'ipynb_checkpoints' (#…
johnwee1 Jun 13, 2024
4b8151a
add trust_remote_code for piqa (#1983)
changwangss Jun 18, 2024
b2c5d8a
Fix self assignment in neuron_optimum.py (#1990)
LSinev Jun 18, 2024
f6a4ecf
[New Task] Add Paloma benchmark (#1928)
zafstojano Jun 19, 2024
7ea9eb8
Fix Paloma Template yaml (#1993)
haileyschoelkopf Jun 19, 2024
7a4f6bc
Log `fewshot_as_multiturn` in results files (#1995)
haileyschoelkopf Jun 19, 2024
ead43ee
Added ArabicMMLU (#1987)
Yazeed7 Jun 19, 2024
de7a9f8
Fix Datasets `--trust_remote_code` (#1998)
haileyschoelkopf Jun 19, 2024
5602d64
Add BertaQA dataset tasks (#1964)
juletx Jun 20, 2024
5fe9d7a
add tokenizer logs info (#1731)
artemorloff Jun 24, 2024
9d97084
Hotfix breaking import (#2015)
StellaAthena Jun 24, 2024
db4d177
add arc_challenge_mt (#1900)
jonabur Jun 25, 2024
d3da290
Remove `LM` dependency from `build_all_requests` (#2011)
baberabb Jun 25, 2024
40807fe
Added CommonsenseQA task (#1721)
murphybrendan Jun 25, 2024
f093045
Factor out LM-specific tests (#1859)
haileyschoelkopf Jun 25, 2024
0a0b75c
Update interface.md (#1982)
johnwee1 Jun 25, 2024
4ff90ad
Fix `trust_remote_code`-related test failures (#2024)
haileyschoelkopf Jun 26, 2024
40ee6b6
Fixes scrolls task bug with few_shot examples (#2003)
xksteven Jun 28, 2024
da7dd7b
fix cache (#2037)
baberabb Jun 28, 2024
14595b8
Add chat template to `vllm` (#2034)
baberabb Jun 28, 2024
8a9237e
fail gracefully upon tokenizer logging failure (#2038)
haileyschoelkopf Jun 29, 2024
5585835
ship with exact_match function already used ; don't call evaluate.loa…
haileyschoelkopf Jul 1, 2024
a90a524
update to v0.4.3 (#2046)
haileyschoelkopf Jul 1, 2024
79f20ea
fix wandb logger module import in example (#2041)
ToluClassics Jul 1, 2024
2d05127
Fix strip whitespace filter (#2048)
NathanHB Jul 1, 2024
78a957d
update gemma-2 default BOS behavior (#2049)
haileyschoelkopf Jul 2, 2024
e7ab930
Update hellaswag.yaml (#2029)
haileyschoelkopf Jul 3, 2024
42c22cb
Adds Open LLM Leaderboard Taks (#2047)
NathanHB Jul 3, 2024
9aab09b
#1442 inverse scaling tasks implementation (#1589)
h-albert-lee Jul 3, 2024
a697ef3
Fix TypeError in samplers.py by converting int to str (#2074)
uni2237 Jul 8, 2024
97c6e90
Group agg rework (#1741)
lintangsutawika Jul 8, 2024
36b3a1d
we run with bootstrap_iters=0 for printing tests (#2080)
haileyschoelkopf Jul 8, 2024
7f61555
Easier unitxt tasks loading and removal of unitxt library dependancy …
elronbandel Jul 8, 2024
7edb22b
Allow gating EvaluationTracker HF Hub results; customizability (#2051)
NathanHB Jul 8, 2024
2c95a72
Minor doc fix: leaderboard README.md missing mmlu-pro group and task …
pankajarm Jul 8, 2024
8da1bdc
fix: utf-8 encoding for logged sample files was missing (#2082)
haileyschoelkopf Jul 9, 2024
a5a9fda
Update utils.py (#2085)
lintangsutawika Jul 10, 2024
a309b60
batch_size may be str if 'auto' is specified (#2084)
meg-huggingface Jul 10, 2024
e062d22
Prettify lm_eval --tasks list (#1929)
anthony-dipofi Jul 11, 2024
9054801
make RougeScorer only initialized once (#2090)
haileyschoelkopf Jul 12, 2024
6b7d7b5
Update default.yaml (#2092)
waneon Jul 12, 2024
d6cd3f2
Add new dataset MMLU-SR tasks (#2032)
SkySuperCat Jul 12, 2024
6e185a5
Irokobench: Benchmark Dataset for African languages (#2042)
JessicaOjo Jul 12, 2024
f0e69dc
docs: remove trailing sentence from contribution doc (#2098)
nathan-weinberg Jul 13, 2024
eb7710a
Added MedConceptsQA Benchmark (#2010)
Ofir408 Jul 14, 2024
40321ce
make recurrent_gemma model types included in the force-BOS case (#2105)
haileyschoelkopf Jul 15, 2024
8a8aa5f
formatting (#2104)
lintangsutawika Jul 15, 2024
97b4e6a
docs: align local test command to match CI (#2100)
nathan-weinberg Jul 15, 2024
c31f114
Fixed colon in Belebele _default_template_yaml (#2111)
jab13x Jul 17, 2024
43ddf79
[python] fix haerae tasks (#2112)
jungwhank Jul 18, 2024
b94018d
fix: broken discord link in CONTRIBUTING.md (#2114)
nathan-weinberg Jul 18, 2024
1e1f4f3
docs: update truthfulqa tasks (#2119)
CandiedCode Jul 20, 2024
e1e17b0
fix caching module (hotfix for now) (#2124)
haileyschoelkopf Jul 21, 2024
9dceef5
Refactor API models (#2008)
baberabb Jul 22, 2024
10b1c5a
bugfix and docs for API (#2139)
baberabb Jul 29, 2024
d73033c
[Bugfix] add temperature=0 to logprobs and seed args to API models (#…
baberabb Aug 1, 2024
10e88dd
refactor: limit usage of `scipy` and `skilearn` dependencies (#2097)
nathan-weinberg Aug 1, 2024
9fef017
Update lm-eval-overview.ipynb (#2118)
haileyschoelkopf Aug 1, 2024
5390dae
fix typo. (#2169)
kargaranamir Aug 4, 2024
8e83360
Update README.md (#2125)
zhabuye Aug 4, 2024
0788887
Dp and mp support (#2056)
NathanHB Aug 5, 2024
fe4cfc4
[hotfix] API: messages were created twice (#2174)
baberabb Aug 5, 2024
ca0307d
add okapi machine translated notice. (#2168)
kargaranamir Aug 5, 2024
a19ad06
remove incorrectly inherited group names (#2181)
haileyschoelkopf Aug 5, 2024
945a718
Mmlu Pro (#1961)
ysjprojects Aug 5, 2024
d23527d
added gsm_plus (#2103)
ysjprojects Aug 5, 2024
4142d04
fix revision type (#2184)
haileyschoelkopf Aug 5, 2024
6071ff2
Update README.md (#2186)
haileyschoelkopf Aug 5, 2024
8945397
gsm_plus minor fix (#2191)
ysjprojects Aug 7, 2024
e20c5b5
keep new line for task description (#2116)
jungwhank Aug 9, 2024
4c81184
Update README.md (#2206)
ysjprojects Aug 10, 2024
fd2df5d
Update citation in README.md (#2083)
antonpolishko Aug 15, 2024
124c1c5
New task: Lingoly (#2198)
am-bean Aug 15, 2024
1c29d12
Created a new task for gsm8k which corresponds to the Llama cot setti…
Cameron7195 Aug 16, 2024
35c2fb9
Lingoly README update (#2228)
am-bean Aug 19, 2024
c7a0bc3
Update yaml to adapt to belebele dataset changes (#2216)
Uminosachi Aug 19, 2024
33dc922
Add TMLU Benchmark Dataset (#2093)
adamlin120 Aug 19, 2024
944ca7e
Update IFEval dataset to official one (#2218)
lewtun Aug 20, 2024
e9d6010
fix the leaderboard doc to reflect the tasks (#2219)
NathanHB Aug 20, 2024
8a719ba
Add multiple chat template (#2129)
KonradSzafer Aug 20, 2024
04b5026
Update CODEOWNERS (#2229)
haileyschoelkopf Aug 20, 2024
ba58a70
Fix Zeno Visualizer (#2227)
namtranase Aug 20, 2024
6d4c0a8
mela (#1970)
Geralt-Targaryen Aug 20, 2024
e4aaaf9
fix the regex string in mmlu_pro template (#2238)
lxning Aug 22, 2024
7bfdb51
Fix logging when resizing embedding layer in peft mode (#2239)
WPoelman Aug 22, 2024
d35ccf1
computer_science --> "computer science" (#2241)
baberabb Aug 22, 2024
e47557e
Fix typos in multiple places (#2244)
LSinev Aug 23, 2024
d452d58
fix group args of mmlu and mmlu_pro (#2245)
eyuansu62 Aug 23, 2024
f366f48
Created new task for testing Llama on Asdiv (#2236)
Cameron7195 Aug 23, 2024
2c15e42
chat template hotfix (#2250)
baberabb Aug 25, 2024
c2ec6ba
[Draft] More descriptive `simple_evaluate()` LM TypeError (#2258)
haileyschoelkopf Aug 28, 2024
3462b6d
update nltk version to require 3.9.1 (#2259)
haileyschoelkopf Aug 28, 2024
a402afa
Fix `loglikelihood_rolling` caching ( #1821 ) (#2187)
haileyschoelkopf Aug 28, 2024
3c4fd26
API: fix maxlen; vllm: prefix_token_id bug (#2262)
baberabb Aug 30, 2024
0e7daed
hotfix #2262 (#2264)
baberabb Aug 30, 2024
a13c93d
Chat Template fix (cont. #2235) (#2269)
baberabb Sep 4, 2024
c360ad4
Bump version to v0.4.4 ; Fixes to TMMLUplus (#2280)
haileyschoelkopf Sep 5, 2024
87a3d54
Add Open Arabic LLM Leaderboard Benchmarks (Full and Light Version) (…
Malikeh97 Sep 10, 2024
57659ed
Multimodal prototyping (#2243)
lintangsutawika Sep 13, 2024
f50e465
Update README.md (#2297)
SYusupov Sep 17, 2024
fd69de5
repr bug (#2315)
baberabb Sep 17, 2024
56cb071
Update neuron backend (#2314)
dacorvo Sep 18, 2024
82c81e3
Fixed dummy model (#2339)
Am1n3e Sep 24, 2024
5085bdb
add a note for missing dependencies (#2336)
eldarkurtic Sep 24, 2024
d3ea541
fix problem with equal sign in file name and add support for mbm chec…
jmercat Sep 25, 2024
98ac6a1
fix weight loading
jmercat Sep 25, 2024
4401bc4
fix error introduced in prismatic.py
jmercat Sep 25, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
6 changes: 3 additions & 3 deletions .github/workflows/new_tasks.yml
Original file line number Diff line number Diff line change
Expand Up @@ -20,13 +20,13 @@ jobs:
with:
fetch-depth: 2 # OR "2" -> To retrieve the preceding commit.

# Uses the tj-actions/changed-files@v37 action to check for changes.
# Uses the tj-actions/changed-files action to check for changes.
# Outputs provided here: https://github.com/tj-actions/changed-files#outputs
# The `files_yaml` input optionally takes a yaml string to specify filters,
# and prepends the filter name to the standard output names.
- name: Check task folders
id: changed-tasks
uses: tj-actions/changed-files@v37.1.2
uses: tj-actions/changed-files@v44.5.2
with:
# tasks checks the tasks folder and api checks the api folder for changes
files_yaml: |
Expand Down Expand Up @@ -56,7 +56,7 @@ jobs:
if: steps.changed-tasks.outputs.tasks_any_modified == 'true' || steps.changed-tasks.outputs.api_any_modified == 'true'
run: |
python -m pip install --upgrade pip
pip install -e '.[dev]' --extra-index-url https://download.pytorch.org/whl/cpu
pip install -e '.[dev,ifeval]' --extra-index-url https://download.pytorch.org/whl/cpu
# Install optional git dependencies
# pip install bleurt@https://github.com/google-research/bleurt/archive/b610120347ef22b494b6d69b4316e303f5932516.zip#egg=bleurt
# if [ -f requirements.txt ]; then pip install -r requirements.txt; fi
Expand Down
31 changes: 28 additions & 3 deletions .github/workflows/unit_tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ jobs:
env:
SKIP: "no-commit-to-branch,mypy"

uses: pre-commit/action@v3.0.0
uses: pre-commit/action@v3.0.1
# # mypy turned off for now
# - name: Lint with mypy
# run: mypy . --ignore-missing-imports --check-untyped-defs --explicit-package-bases --warn-unreachable
Expand All @@ -56,12 +56,37 @@ jobs:
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -e '.[dev,anthropic,sentencepiece,optimum]' --extra-index-url https://download.pytorch.org/whl/cpu
pip install -e '.[dev,sentencepiece,api]' --extra-index-url https://download.pytorch.org/whl/cpu
# Install optional git dependencies
# pip install bleurt@https://github.com/google-research/bleurt/archive/b610120347ef22b494b6d69b4316e303f5932516.zip#egg=bleurt
# if [ -f requirements.txt ]; then pip install -r requirements.txt; fi
- name: Test with pytest
run: python -m pytest --showlocals -s -vv -n=auto
run: python -m pytest --showlocals -s -vv -n=auto --ignore=tests/models/test_neuralmagic.py --ignore=tests/models/test_openvino.py
- name: Archive artifacts
uses: actions/upload-artifact@v3
with:
name: output_results
path: |
test_logs/*
testmodels:
name: External LM Tests
runs-on: ubuntu-latest
timeout-minutes: 30
steps:
- name: Checkout Code
uses: actions/checkout@v4
- name: Set up Python 3.8
uses: actions/setup-python@v5
with:
python-version: 3.8
cache: pip
cache-dependency-path: pyproject.toml
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -e '.[dev,optimum,deepsparse,sparseml,api]' --extra-index-url https://download.pytorch.org/whl/cpu
- name: Test with pytest
run: python -m pytest tests/models --showlocals -s -vv
- name: Archive artifacts
uses: actions/upload-artifact@v3
with:
Expand Down
6 changes: 6 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,12 @@ temp
__pycache__
.ipynb_checkpoints
temp
test_logs/
# IPython
profile_default/
ipython_config.py
# don't track (the default location of) the cached requests
lm_eval/caching/.cache
# don't track files created by wandb
wandb
examples/wandb
20 changes: 10 additions & 10 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,14 +2,15 @@
exclude: ^tests/testdata/
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.1.0
rev: v4.5.0
hooks:
- id: check-added-large-files
- id: check-ast
- id: check-byte-order-marker
- id: check-case-conflict
- id: check-json
- id: check-merge-conflict
args: [--assume-in-merge]
- id: check-symlinks
- id: check-yaml
args: ["--unsafe"]
Expand All @@ -28,8 +29,7 @@ repos:
- id: mixed-line-ending
args: [--fix=lf]
- repo: https://github.com/astral-sh/ruff-pre-commit
# Ruff version.
rev: v0.1.8
rev: v0.4.8
hooks:
# Run the linter.
- id: ruff
Expand All @@ -38,17 +38,17 @@ repos:
# Run the formatter.
- id: ruff-format
- repo: https://github.com/codespell-project/codespell
rev: v2.1.0
rev: v2.3.0
hooks:
- id: codespell
exclude: >
(?x)^(
.*\.json|ignore.txt|lm_eval/tasks/.*|.*yaml|.*\.ipynb
)$
args: [--check-filenames, --check-hidden, --ignore-words=ignore.txt]
- repo: https://github.com/pre-commit/mirrors-mypy
rev: v1.5.1
hooks:
- id: mypy
additional_dependencies: [".[sentencepiece,multilingual,promptsource,gptq]", "types-PyYAML", "types-requests"]
exclude: ^tests/.*$
# - repo: https://github.com/pre-commit/mirrors-mypy
# rev: v1.5.1
# hooks:
# - id: mypy
# additional_dependencies: [".[sentencepiece,multilingual,promptsource,gptq]", "types-PyYAML", "types-requests"]
# exclude: ^tests/.*$
2 changes: 1 addition & 1 deletion CODEOWNERS
Validating CODEOWNERS rules …
Original file line number Diff line number Diff line change
@@ -1 +1 @@
* @haileyschoelkopf @lintangsutawika
* @haileyschoelkopf @lintangsutawika @baberabb
Loading