-
Notifications
You must be signed in to change notification settings - Fork 28k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
When set num_beams in GenerationConfig, stop_strings parameter has no effect #34843
Comments
@gante since this is a crossover stop strings / beam search issue, can you take a look? Feel free to ping me if you think it's an issue in the stop string handling, though, and I can try to fix it. |
Seems to me as though any stopping criteria being met should halt generation, as opposed to all needing to be met simultaneously. If that is incorrect, please let me know and I can work on a different type of fix. |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
@Rocketknight1 I see what's going on. From beam search's perspective, we may hit the stopping criteria with one of the beams, but we will continue generating because there are other valid beams to continue. However, we do not tag the complete beam as complete -- there is actually no mechanism to do it in our current pytorch implementation of beam search. As such, internally, we can see the stopping criteria being detected, but we continue the sequence anyway. While I could patch this property (with non-trivial changes), I'm going to prioritize the refactor of beam search instead. I've been meaning to refactor beam search in FLAX/TF style, as it is easier to read/maintain and contains vectorized operations. I will then apply the fix for this property. For future reference, code for reproduction (the original example is missing a model) from transformers import GenerationConfig, AutoTokenizer, AutoModelForCausalLM
generate_config = GenerationConfig(
num_beams=3,
do_sample=True,
temperature=0.7,
num_return_sequences=3,
top_k=50,
top_p=0.95,
repetition_penalty=1.0,
length_penalty=1.0,
stop_strings=":",
return_dict_in_generate=True,
max_new_tokens=500,
output_logits=True
)
tokenizer = AutoTokenizer.from_pretrained( "Qwen/Qwen2.5-0.5B-Instruct")
model = AutoModelForCausalLM.from_pretrained( "Qwen/Qwen2.5-0.5B-Instruct").cuda()
PROMPT = "Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May?"
tokens = tokenizer(PROMPT, return_tensors="pt").to(model.device)
out = model.generate(**tokens, generation_config=generate_config, tokenizer=tokenizer)
print(tokenizer.decode(out.sequences[0], skip_special_tokens=True)) |
@gante yes, sounds good! I can definitely see some awkward interactions here. Let me know if we need to refactor the stop strings check - right now, it only checks the tokens at the end of a sequence, because that's sufficient when tokens are generated one at a time. However, with beam search that assumption might be invalid. |
@Rocketknight1 nope, no need to refactor any stopping criteria :) This has more to do with the old beam search code not expecting stopping criteria other than a specific EOS token and max length than anything else BTW, I've linked above a draft PR that will vectorize beam search and start digesting any stopping criteria. |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
System Info
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using
tokenizers
before the fork if possible- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Copy-and-paste the text below in your GitHub issue and FILL OUT the two last points.
transformers
version: 4.46.2Who can help?
@gante
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Code
Out
Expected behavior
when I set num_beams=1, the stop_strings works well !
The text was updated successfully, but these errors were encountered: