When set num_beams in GenerationConfig, stop_strings parameter has no effect #34843

ZYM66 · 2024-11-21T07:07:53Z

System Info

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using tokenizers before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)

Copy-and-paste the text below in your GitHub issue and FILL OUT the two last points.

transformers version: 4.46.2
Platform: Linux-5.15.153.1-microsoft-standard-WSL2-x86_64-with-glibc2.39
Python version: 3.10.15
Huggingface_hub version: 0.26.2
Safetensors version: 0.4.5
Accelerate version: 1.1.1
Accelerate config: not found
PyTorch version (GPU?): 2.5.1 (True)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using distributed or parallel set-up in script?:
Using GPU in script?:
GPU type: NVIDIA GeForce RTX 4070 SUPER

Who can help?

@gante

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

Code

from transformers import GenerationConfig, AutoTokenizer, AutoModelForCausalLM

generate_config = GenerationConfig(
    num_beams=3,
    do_sample=True,
    temperature=0.7,
    num_return_sequences=3,
    top_k=50,
    top_p=0.95,
    repetition_penalty=1.0,
    length_penalty=1.0,
    stop_strings=":",
    return_dict_in_generate=True,
    max_new_tokens=500,
    output_logits=True
)


tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH)
model = AutoModelForCausalLM.from_pretrained(MODEL_PATH).cuda()

PROMPT = "Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May?"


tokens = tokenizer(PROMPT, return_tensors="pt").to(model.device)
out = model.generate(**tokens, generation_config=generate_config, tokenizer=tokenizer)

print(tokenizer.decode(out.sequences[0], skip_special_tokens=True))

Out

Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May? To determine the total number of clips Natalia sold in April and May, we need to follow these steps:

1. Identify the number of clips sold in April.
2. Calculate the number of clips sold in May.
3. Add the number of clips sold in April and May together.

First, we know that Natalia sold 48 clips in April. Next, we need to find out how many clips she sold in May. According to the problem, she sold half as many clips in May as she did in April. Therefore, we calculate the number of clips sold in May as follows:
\[
\text{Number of clips sold in May} = \frac{48}{2} = 24
\]

Now, we add the number of clips sold in April and May together to find the total number of clips sold:
\[
\text{Total number of clips sold} = 48 + 24 = 72
\]

Thus, the total number of clips Natalia sold in April and May is \boxed{72}.

Expected behavior

when I set num_beams=1, the stop_strings works well !

The text was updated successfully, but these errors were encountered:

Rocketknight1 · 2024-11-21T12:48:41Z

@gante since this is a crossover stop strings / beam search issue, can you take a look? Feel free to ping me if you think it's an issue in the stop string handling, though, and I can try to fix it.

secrettoad · 2024-11-30T00:13:50Z

Seems to me as though any stopping criteria being met should halt generation, as opposed to all needing to be met simultaneously. If that is incorrect, please let me know and I can work on a different type of fix.

github-actions · 2025-01-15T08:04:36Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

gante · 2025-01-15T11:28:59Z

@Rocketknight1 I see what's going on.

From beam search's perspective, we may hit the stopping criteria with one of the beams, but we will continue generating because there are other valid beams to continue. However, we do not tag the complete beam as complete -- there is actually no mechanism to do it in our current pytorch implementation of beam search. As such, internally, we can see the stopping criteria being detected, but we continue the sequence anyway.

While I could patch this property (with non-trivial changes), I'm going to prioritize the refactor of beam search instead. I've been meaning to refactor beam search in FLAX/TF style, as it is easier to read/maintain and contains vectorized operations. I will then apply the fix for this property.

For future reference, code for reproduction (the original example is missing a model)

from transformers import GenerationConfig, AutoTokenizer, AutoModelForCausalLM

generate_config = GenerationConfig(
    num_beams=3,
    do_sample=True,
    temperature=0.7,
    num_return_sequences=3,
    top_k=50,
    top_p=0.95,
    repetition_penalty=1.0,
    length_penalty=1.0,
    stop_strings=":",
    return_dict_in_generate=True,
    max_new_tokens=500,
    output_logits=True
)


tokenizer = AutoTokenizer.from_pretrained( "Qwen/Qwen2.5-0.5B-Instruct")
model = AutoModelForCausalLM.from_pretrained( "Qwen/Qwen2.5-0.5B-Instruct").cuda()

PROMPT = "Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May?"


tokens = tokenizer(PROMPT, return_tensors="pt").to(model.device)
out = model.generate(**tokens, generation_config=generate_config, tokenizer=tokenizer)
print(tokenizer.decode(out.sequences[0], skip_special_tokens=True))

Rocketknight1 · 2025-01-15T14:49:28Z

@gante yes, sounds good! I can definitely see some awkward interactions here. Let me know if we need to refactor the stop strings check - right now, it only checks the tokens at the end of a sequence, because that's sufficient when tokens are generated one at a time. However, with beam search that assumption might be invalid.

gante · 2025-01-20T19:09:39Z

@Rocketknight1 nope, no need to refactor any stopping criteria :) This has more to do with the old beam search code not expecting stopping criteria other than a specific EOS token and max length than anything else

BTW, I've linked above a draft PR that will vectorize beam search and start digesting any stopping criteria.

github-actions · 2025-02-14T08:05:53Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

ZYM66 added the bug label Nov 21, 2024

secrettoad linked a pull request Nov 30, 2024 that will close this issue

adjust beam search early stopping to any criteria as opposed to all #35023

Open

5 tasks

zucchini-nlp mentioned this issue Dec 2, 2024

when model.generate with num_beams=2 and num_return_sequences=2,the output seqs are different from input_ids of stopping_criteria #34574

Open

4 tasks

gante mentioned this issue Jan 15, 2025

Moonshine logits processor #35676

Closed

gante linked a pull request Jan 20, 2025 that will close this issue

[generate] ✨ vectorized beam search ✨ #35802

Open

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

When set num_beams in GenerationConfig, stop_strings parameter has no effect #34843

When set num_beams in GenerationConfig, stop_strings parameter has no effect #34843

ZYM66 commented Nov 21, 2024 •

edited

Loading

Rocketknight1 commented Nov 21, 2024

secrettoad commented Nov 30, 2024

github-actions bot commented Jan 15, 2025

gante commented Jan 15, 2025 •

edited

Loading

Rocketknight1 commented Jan 15, 2025

gante commented Jan 20, 2025

github-actions bot commented Feb 14, 2025

When set num_beams in GenerationConfig, stop_strings parameter has no effect #34843

When set num_beams in GenerationConfig, stop_strings parameter has no effect #34843

Comments

ZYM66 commented Nov 21, 2024 • edited Loading

System Info

Who can help?

Information

Tasks

Reproduction

Code

Out

Expected behavior

Rocketknight1 commented Nov 21, 2024

secrettoad commented Nov 30, 2024

github-actions bot commented Jan 15, 2025

gante commented Jan 15, 2025 • edited Loading

Rocketknight1 commented Jan 15, 2025

gante commented Jan 20, 2025

github-actions bot commented Feb 14, 2025

ZYM66 commented Nov 21, 2024 •

edited

Loading

gante commented Jan 15, 2025 •

edited

Loading