Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Apply min_new_tokens=2 to mixtral-8x7b, address #1777 #1884

Merged
merged 1 commit into from
Oct 22, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion compliance/nvidia/TEST06/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ This repository provides the config files and scripts to run and verify TEST 06

The purpose of this test is to ensure the consistency of the output of the LLM (Llama2 and Mixtral) model and avoid a potential EOS exploit. This test will make a performance run, with a limit of 100 samples and logging them into `mlperf_log_accuracy.json`. To achieve a passing result in this test, three criteria must be met:
- In the case the first token is reported independently (not applicable for Offline scenario), it should match for every query with the first token of the model output.
- For each query, the model output should only end with zero or one EOS token. The only exception for 2 EOS tokens is when the entire output sequences are EOS tokens (i.e. output is [eos_token_id, eos_token_id])
- For each query, the model output should only end with zero or one EOS token.
- The number of reported tokens should match with the length of output sequence.

## Requisites
Expand Down
3 changes: 1 addition & 2 deletions compliance/nvidia/TEST06/run_verification.py
Original file line number Diff line number Diff line change
Expand Up @@ -51,8 +51,7 @@ def eos_check(acc_data, dtype, eos_token_id=2):
if data[i] == eos_token_id:
n_eos_tokens += 1
if n_eos_tokens >= 2:
# Allow output to be [eos_token_id, eos_token_id]
return len(data) == 2
return False
if data[i] != eos_token_id:
break
i-=1
Expand Down
13 changes: 8 additions & 5 deletions language/mixtral-8x7b/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -109,6 +109,9 @@ rclone copyurl https://inference.mlcommons-storage.org/mixtral_8x7b%2F2024.06.06
#### Using wget

Alternatively, you can simply cd into the folder where you want to place the dataset and run

TBD: The dataset is being replaced in v5.0 due to https://github.com/mlcommons/inference/issues/1777

```bash
wget https://inference.mlcommons-storage.org/mixtral_8x7b%2F2024.06.06_mixtral_15k_v4.pkl
```
Expand Down Expand Up @@ -261,17 +264,17 @@ python -u evaluate-accuracy.py --checkpoint-path [path_to_model_checkpoint] \
Reference scores:
Open Orca:
```json
{'rouge1': 45.4911, 'rouge2': 23.2829, 'rougeL': 30.3615}
{'rouge1': 45.5989, 'rouge2': 23.3526, 'rougeL': 30.4608}
```
GSM8K:
```json
{'gsm8k': 73.78}
{'gsm8k': 73.66}
```
MBXP:
```json
{'mbxp': 60.12}
{'mbxp': 60.16}
```
For official submissions, 99% of each reference score is enforced. Additionally, 90%-110% of the generated tokens_per_samples:
For official submissions, 99% of each reference score is enforced. Additionally, 90%-110% of the generated tokens_per_samples (counting all the non-EOS tokens):
```json
{'tokens_per_sample': 145.9}
{'tokens_per_sample': 144.84}
```
2 changes: 1 addition & 1 deletion language/mixtral-8x7b/SUT.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@
gen_kwargs = {
"early_stopping": True,
"max_new_tokens": 1024,
"min_new_tokens": 1,
"min_new_tokens": 2,
"num_beams": 1,
"do_sample": False
}
Expand Down
Loading