-
Notifications
You must be signed in to change notification settings - Fork 540
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Urgent] Mixtral MOE dataset has reference output token length == 0 (or 1 if you count EOS) #1777
Comments
@pgmpablo157321 suggested an alternative method, where we increase
This must be relevant to how HF suppresses the EOS and change the token distribution with |
Nice catch! I checked our both GPU and Gaudi references: exactly the same four samples have issues (flan.1662985, flan.403129, flan.778592, flan.387020). I guess that prompt format is invalid - I see that there is missing space between I support the solution proposed by Zhihan. We definitely have to introduce small fixes to the dataset before the next round - it's too late to do it now. |
Update: this issue was discussed in the 7/16/24 Inference Working Group meeting, and the mitigation 2 is agreed upon. We will merge #1778 (which allows 2 EOS in the output) and Pablo will update the reference implementation to reflect the change on the SUT side. |
WG comments: To revisit in 5.0 |
There are 4 samples in the reference HF output that has no output other than the EOS.
This will cause a bug in the server scenario because when we measure TPOT we use (Total - TTFT) / (#tokens - 1). Even if we count the EOS we will have divide by 0 problem.
Suggested mitigation:
Given the tightened timeline, I propose we WAR with 2 and fix 1 in the following round. We think the root-cause of this issue is 1) bug with Mixtral-8x7b-instruct model, and 2) the fact that we are using greedy.
The text was updated successfully, but these errors were encountered: