[Urgent] Mixtral MOE dataset has reference output token length == 0 (or 1 if you count EOS) #1777

nvzhihanj · 2024-07-11T21:28:27Z

There are 4 samples in the reference HF output that has no output other than the EOS.

>>> df = pd.read_pickle("06062024_mixtral_15k_v4.pkl")
>>> df[df['tok_ref_output_len'] == 1]
       dataset            id                                           question                                              input ref_output  ... tok_ref_output stop_sequence tok_stop_sequence tok_input_len tok_ref_output_len
6486  OpenOrca  flan.1662985  You will be given a text below. Complete the t...  <s>[INST] You are an AI assistant. User will y...             ...            [2]          </s>               [2]           146                  1
6489  OpenOrca   flan.403129  Write the last sentence in this story.\n\nFor ...  <s>[INST] You are a helpful assistant, who alw...             ...            [2]          </s>               [2]           192                  1
6630  OpenOrca   flan.778592  How does the sentence end?\n\n(CNN) -- In the ...  <s>[INST] You are an AI assistant. User will y...             ...            [2]          </s>               [2]           267                  1
6733  OpenOrca   flan.387020  What's the most logical way to complete this p...  <s>[INST] You are an AI assistant. You will be...             ...            [2]          </s>               [2]           259                  1

This will cause a bug in the server scenario because when we measure TPOT we use (Total - TTFT) / (#tokens - 1). Even if we count the EOS we will have divide by 0 problem.

Suggested mitigation:

Remove these 4 samples or substitute these with other open_orca rows.
Not removing the 4 samples, but make [2, 2] a legit output (same as receiving [X, 2]. We just need to change TEST06 to allow this.

Given the tightened timeline, I propose we WAR with 2 and fix 1 in the following round. We think the root-cause of this issue is 1) bug with Mixtral-8x7b-instruct model, and 2) the fact that we are using greedy.

The text was updated successfully, but these errors were encountered:

…R mixtral

nvzhihanj · 2024-07-15T20:41:12Z

@pgmpablo157321 suggested an alternative method, where we increase min_new_token=2 when generating the reference. I did a quick run last weekend, and checked the token length difference between the old run and new run.
Although lots of them didn't change, we observe an average of 60 shift in output length shift

new_len = mixtral8x7b_ref_15k_minnew2_df["nv_tok_ref_output_length"].to_numpy()
old_len = mixtral8x7b_ref_15k_minnew2_df["tok_ref_output_len"].to_numpy()
len_diff = np.abs(new_len - old_len)
np.mean(len_diff)
63.06126666666667

This must be relevant to how HF suppresses the EOS and change the token distribution with min_new_token enabled. I don't think we should go that approach.

szutenberg · 2024-07-15T21:50:45Z

Nice catch! I checked our both GPU and Gaudi references: exactly the same four samples have issues (flan.1662985, flan.403129, flan.778592, flan.387020).

I guess that prompt format is invalid - I see that there is missing space between <s> and [INST]. Also, new line at the end can help with generating the right output (see https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1/blob/main/tokenizer_config.json#L33 and discussion https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1/discussions/75 ).

I support the solution proposed by Zhihan. We definitely have to introduce small fixes to the dataset before the next round - it's too late to do it now.

nvzhihanj · 2024-07-16T16:23:57Z

Update: this issue was discussed in the 7/16/24 Inference Working Group meeting, and the mitigation 2 is agreed upon. We will merge #1778 (which allows 2 EOS in the output) and Pablo will update the reference implementation to reflect the change on the SUT side.

…R mixtral

…#1778)

mrmhodak · 2024-07-23T16:06:30Z

WG comments: To revisit in 5.0

nvzhihanj added a commit to nvzhihanj/inference that referenced this issue Jul 11, 2024

mlcommons#1777: TEST06 allows [eos_token_id, eos_token_id] case to WA…

9673064

…R mixtral

nvzhihanj mentioned this issue Jul 11, 2024

#1777: TEST06 allows [eos_token_id, eos_token_id] case to WAR mixtral #1778

Merged

pgmpablo157321 pushed a commit to nvzhihanj/inference that referenced this issue Jul 16, 2024

mlcommons#1777: TEST06 allows [eos_token_id, eos_token_id] case to WA…

b70cb27

…R mixtral

pgmpablo157321 pushed a commit that referenced this issue Jul 16, 2024

#1777: TEST06 allows [eos_token_id, eos_token_id] case to WAR mixtral (…

c60103b

…#1778)

pgmpablo157321 added the inference v5.0 label Jul 23, 2024

nvzhihanj added a commit to nvzhihanj/inference that referenced this issue Oct 22, 2024

Apply min_new_tokens=2 to mixtral-8x7b, address mlcommons#1777

ef5c366

nvzhihanj mentioned this issue Oct 22, 2024

Apply min_new_tokens=2 to mixtral-8x7b, address #1777 #1884

Merged

nvzhihanj added a commit to nvzhihanj/inference that referenced this issue Oct 22, 2024

Apply min_new_tokens=2 to mixtral-8x7b, address mlcommons#1777

13d1c09

mrmhodak pushed a commit that referenced this issue Oct 22, 2024

Apply min_new_tokens=2 to mixtral-8x7b, address #1777 (#1884)

f74d16f

nvzhihanj mentioned this issue Oct 24, 2024

Update accuracy numbers to address mixtral 0-token issue mlcommons/inference_policies#301

Merged

pgmpablo157321 mentioned this issue Oct 29, 2024

Update Moe dataset download instructions #1888

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Urgent] Mixtral MOE dataset has reference output token length == 0 (or 1 if you count EOS) #1777

[Urgent] Mixtral MOE dataset has reference output token length == 0 (or 1 if you count EOS) #1777

nvzhihanj commented Jul 11, 2024

nvzhihanj commented Jul 15, 2024 •

edited

Loading

szutenberg commented Jul 15, 2024

nvzhihanj commented Jul 16, 2024

mrmhodak commented Jul 23, 2024

[Urgent] Mixtral MOE dataset has reference output token length == 0 (or 1 if you count EOS) #1777

[Urgent] Mixtral MOE dataset has reference output token length == 0 (or 1 if you count EOS) #1777

Comments

nvzhihanj commented Jul 11, 2024

nvzhihanj commented Jul 15, 2024 • edited Loading

szutenberg commented Jul 15, 2024

nvzhihanj commented Jul 16, 2024

mrmhodak commented Jul 23, 2024

nvzhihanj commented Jul 15, 2024 •

edited

Loading