-
Notifications
You must be signed in to change notification settings - Fork 27.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Severe Bug] Performance Degradation Starting from v4.42.* #31890
Comments
Hey! I'll try to skim through the ressources, but I can't really think of something that could trigger the addition of spaces. I don't know where they are added, but is this related to #26678 ? |
Can you share a small snippet or just a string where you saw extra spaces? |
fyi @itazap |
Thanks for getting back to me @ArthurZucker! The extra spaces usually appear in the prompt part of the model outputs. Here is an example of {"task_id": "BigCodeBench/0", "solution": "import itertools\nfrom random import shuffle\n\ndef task_func(numbers=list(range(1, 11))):\n \"\"\"\n Calculates the average of the sums of absolute differences between each pair of consecutive numbers \n for all permutations of a given list. Each permutation is shuffled before calculating the differences.\n\n Args:\n - numbers (list): A list of numbers. Default is numbers from 1 to 10.\n \n Returns:\n float: The average of the sums of absolute differences for each shuffled permutation of the list.\n\n Requirements:\n - itertools\n - random. shuffle\n\n Example:\n >>> result = task_func([1, 2, 3 ])\n >>> isinstance(result, float)\n True\n \"\"\"\n sum_diffs = 0\n permutations = list(itertools.permutations(numbers))\n for perm in permutations:\n shuffle(perm)\n diffs = [abs(perm[i] - perm[i + 1]) for i in range(len(perm) - 1)]\n sum_diffs += sum(diffs)\n return sum_diffs / len(permutations)"} The ideal one should be: {"task_id": "BigCodeBench/0", "solution": "import itertools\nfrom random import shuffle\n\ndef task_func(numbers=list(range(1, 11))):\n \"\"\"\n Calculates the average of the sums of absolute differences between each pair of consecutive numbers \n for all permutations of a given list. Each permutation is shuffled before calculating the differences.\n\n Args:\n - numbers (list): A list of numbers. Default is numbers from 1 to 10.\n \n Returns:\n float: The average of the sums of absolute differences for each shuffled permutation of the list.\n\n Requirements:\n - itertools\n - random.shuffle\n\n Example:\n >>> result = task_func([1, 2, 3])\n >>> isinstance(result, float)\n True\n \"\"\"\n sum_diffs = 0\n permutations = list(itertools.permutations(numbers))\n for perm in permutations:\n shuffle(perm)\n diffs = [abs(perm[i] - perm[i + 1]) for i in range(len(perm) - 1)]\n sum_diffs += sum(diffs)\n return sum_diffs / len(permutations)"} For example, I haven't checked other models, so not sure if it's a common pattern or not. |
Could you share how you call the tokenizer? (like how you initialize it) #30964 and #31305 are the only "big" changes that happened. Could you try to set |
It seems like:
I don't know if this is in decoding or in encoding. Which is important for us to be able to fix! |
It's on https://github.com/bigcode-project/bigcodebench/blob/bbe93d673fd236e99b81cd2d7f110b63c9c2da35/bigcodebench/model.py#L137 and https://github.com/bigcode-project/bigcodebench/blob/bbe93d673fd236e99b81cd2d7f110b63c9c2da35/bigcodebench/model.py#L197.
Let me try |
Actually using |
The issue could just as well be the chat template call, given that this is something that was touched, while the |
Yeah, similar to my thoughts. We've only tested Chat models. However, |
My bad, the extra space no longer exists. I'll check the final results to see if the scores are similar to the reported ones. |
@ArthurZucker BTW, if |
I don't really know, because I don't remember exactly how the tokenizer is called / how the tokenizer was trained. What fixed your issue? |
Both I'll run the full generation with |
@ArthurZucker When generating with There should be some other issues, I guess? |
I am not super sure, I don't know how vLLM calls the tokenizer, but |
The first few prompts sent to the vLLM calls: yi_vllm.txt. Looks correct, but outputs are bad. Do you also need inputs for the |
It would be nice to have the input token ids and input text / chat yep |
yes for transformers only! |
cc @Rocketknight1 if you have an idea as well |
Here are the top 10 pairs of prompts and token ids! |
Hi @terryyz , can you please also share the output of the prompts you are seeing to show the issue? Thanks! |
Hi @itazap, are there any specific files you want to see? Or just the ones where the model degraded? If that's the case, there were plenty of them in the original thread: bigcode-project/bigcodebench#21 |
@terryyz Sorry, I understood that |
@itazap Unfortunately |
@itazap @ArthurZucker Update: EOS = [
"<|endoftext|>",
"<|endofmask|>",
"</s>",
"\nif __name__",
"\ndef main(",
"\nprint(",
]
stop_sequencer = StopSequencer(
self.model,
model_type="causal", # or seq2seq
tokenizer=self.tokenizer,
)
model = stop_sequencer.register_stop_texts(
stop_texts=self.eos,
input_length=input_tokens.size(-1),
)
outputs = model.generate(
input_tokens,
max_new_tokens=self.max_new_tokens,
do_sample=do_sample,
num_return_sequences=min(self.batch_size, num_samples),
pad_token_id=self.tokenizer.eos_token_id,
**kwargs,
)
# self.eos: ['<|endoftext|>', '<|endofmask|>', '</s>', '\nif __name__', '\ndef main(', '\nprint(', '\n```\n'] It seems the above part caused the sudden stop of the generation. However, the EOS doesn't appear in the outputs. Do you know if this is expected? I did a similar setup for VLLM, for example:
This is the codebase of VLLM may have a different setup for the tokenizer; should I crosspost this issue? |
Sorry I find it a bit hard to follow, which outputs are you looking for the EOS in? |
Oh, it's a further investigation on #31890 (comment). I removed the part of |
@itazap @ArthurZucker Is there going to be a PR to make the correct setup as default? 👀 And as VLLM possibly has a different implementation for the tokenizer, should we inform them? |
Sorry what do you mean by correct setup? 🤗 |
I mean the initialization of |
I think the best is to open a PR on the hub! I can't merge it for you, but pinging the authors with this issue should be good already! WDYT? |
we can't default on our side because it would break for a lot of other models |
@ArthurZucker Do you mean that the setup of |
@ArthurZucker Ping again to check the answers 👀 |
In terms of which models this relates to, there is no list, but it can be determined by checking the |
Thanks @itazap! Will the rule-based checking be added in the next Transformers release? I can still use |
But @terryyz IMO the best solution is to merge the fix in the model / models that you are using. We could indeed check the transformers versions and force New models like Llama3 or Gemma have this set to |
But we can do a deprecation cycle, if legacy is set to None, we warn that next release it will default to False instead of True cc @itazap wdyt? |
@ArthurZucker As you said, it's hard to predict which is more aligned with current users' expectations! With the warning it should be okay! |
Thanks @ArthurZucker and @itazap! That makes sense :) I'm now having an optional argument for |
Thanks for your patience! Let us know if you have any further issues! 🤗 |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
System Info
Hi @ArthurZucker and the team,
We have noticed a severe performance degradation when trying to reproduce the BigCodeBench results: bigcode-project/bigcodebench#21. I initially thought it was due to the update of
vllm
, but actually, it was not. The greatly affected models include01-ai/Yi-1.5
families,google/codegemma
families,meta-llama/Meta-Llama-3
families. Specifically, I observed some weird extra spaces in the01-ai/Yi-1.5-9B-Chat
. These spaces only appear starting fromv4.42.*
, not before. I also tried to generate w/ovllm
later, which is not documented in the issue thread. It indeed appears to be the issues related totransformers
notvllm
.cc @takkyu2 who reported the original issue.
Who can help?
@ArthurZucker
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
The steps to reproduce the problems and corresponding results have been described in the bigcode-project/bigcodebench#21.
Expected behavior
The model output quality should be similar to the ones in pregenerated LLM outputs.
The text was updated successfully, but these errors were encountered: