Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve generate docstring (for TF and FLAX) #18432

Closed
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 26 additions & 12 deletions src/transformers/generation_flax_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -208,7 +208,6 @@ def generate(
post](https://huggingface.co/blog/how-to-generate).

Parameters:

input_ids (`jnp.ndarray` of shape `(batch_size, sequence_length)`):
The sequence used as a prompt for the generation.
max_length (`int`, *optional*, defaults to `model.config.max_length`):
Expand All @@ -217,23 +216,38 @@ def generate(
the prompt.
max_new_tokens (`int`, *optional*):
The maximum numbers of tokens to generate, ignoring the number of tokens in the prompt.
do_sample (`bool`, *optional*, defaults to `False`):
Whether or not to use sampling ; use greedy decoding otherwise.
temperature (`float`, *optional*, defaults to 1.0):
The value used to module the next token probabilities.
top_k (`int`, *optional*, defaults to 50):
min_length (`int`, *optional*, defaults to `model.config.min_length` or 10 if the config does not set any
value): The minimum length of the sequence to be generated
do_sample (`bool`, *optional*, defaults to `model.config.do_sample` or `False` if the config does not set
any value): Whether or not to use sampling ; use greedy decoding otherwise.
temperature (`float`, *optional*, defaults to `model.config.temperature` or 1.0 if the config does not set
any value): The value used to module the next token probabilities.
top_k (`int`, *optional*, defaults to `model.config.top_k` or 50 if the config does not set any value):
The number of highest probability vocabulary tokens to keep for top-k-filtering.
top_p (`float`, *optional*, defaults to 1.0):
top_p (`float`, *optional*, defaults to `model.config.top_p` or 1.0 if the config does not set any value):
If set to float < 1, only the most probable tokens with probabilities that add up to `top_p` or higher
are kept for generation.
pad_token_id (`int`, *optional*):
pad_token_id (`int`, *optional*, defaults to `model.config.pad_token_id`):
The id of the *padding* token.
bos_token_id (`int`, *optional*):
bos_token_id (`int`, *optional*, defaults to `model.config.bos_token_id`):
The id of the *beginning-of-sequence* token.
eos_token_id (`int`, *optional*):
eos_token_id (`int`, *optional*, defaults to `model.config.eos_token_id`):
The id of the *end-of-sequence* token.
num_beams (`int`, *optional*, defaults to 1):
Number of beams for beam search. 1 means no beam search.
length_penalty (`float`, *optional*, defaults to `model.config.length_penalty` or 1.0 if the config does
not set any value):
Exponential penalty to the length. 1.0 means that the beam score is penalized by the sequence length.
0.0 means no penalty. Set to values < 0.0 in order to encourage the model to generate longer
sequences, to a value > 0.0 in order to encourage the model to produce shorter sequences.
no_repeat_ngram_size (`int`, *optional*, defaults to `model.config.no_repeat_ngram_size` or 0 if the config:
does not set any value): If set to int > 0, all ngrams of that size can only occur once.
num_beams (`int`, *optional*, defaults to `model.config.num_beams` or 1 if the config does not set any
value): Number of beams for beam search. 1 means no beam search.
forced_bos_token_id (`int`, *optional*, defaults to `model.config.forced_bos_token_id`):
The id of the token to force as the first generated token after the `decoder_start_token_id`. Useful
for multilingual models like [mBART](../model_doc/mbart) where the first generated token needs to be
the target language token.
forced_eos_token_id (`int`, *optional*, defaults to `model.config.forced_eos_token_id`):
The id of the token to force as the last generated token when `max_length` is reached.
decoder_start_token_id (`int`, *optional*):
If an encoder-decoder model starts decoding with a different token than *bos*, the id of that token.
trace (`bool`, *optional*, defaults to `True`):
Expand Down
70 changes: 32 additions & 38 deletions src/transformers/generation_tf_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -418,9 +418,7 @@ def generate(
post](https://huggingface.co/blog/how-to-generate).

Parameters:

input_ids (`tf.Tensor` of shape `(batch_size, sequence_length)`, `(batch_size, sequence_length,
feature_dim)` or `(batch_size, num_channels, height, width)`, *optional*):
input_ids (`tf.Tensor` of shape `(batch_size, sequence_length)`, `(batch_size, sequence_length, feature_dim)` or `(batch_size, num_channels, height, width)`, *optional*):
The sequence used as a prompt for the generation or as model inputs to the encoder. If `None` the
method initializes it with `bos_token_id` and a batch size of 1. For decoder-only models `inputs`
should of in the format of `input_ids`. For encoder-decoder models *inputs* can represent any of
Expand All @@ -431,42 +429,43 @@ def generate(
the prompt.
max_new_tokens (`int`, *optional*):
The maximum numbers of tokens to generate, ignoring the number of tokens in the prompt.
min_length (`int`, *optional*, defaults to 10):
min_length (`int`, *optional*, defaults to `model.config.min_length` or 10 if the config does not set any value):
The minimum length of the sequence to be generated.
do_sample (`bool`, *optional*, defaults to `False`):
do_sample (`bool`, *optional*, defaults to `model.config.do_sample` or `False` if the config does not set any value):
Whether or not to use sampling ; use greedy decoding otherwise.
early_stopping (`bool`, *optional*, defaults to `False`):
Whether to stop the beam search when at least `num_beams` sentences are finished per batch or not.
num_beams (`int`, *optional*, defaults to 1):
Number of beams for beam search. 1 means no beam search.
temperature (`float`, *optional*, defaults to 1.0):
The value used to module the next token probabilities.
top_k (`int`, *optional*, defaults to 50):
num_beams (`int`, *optional*, defaults to `model.config.num_beams` or 1 if the config does not set any
value): Number of beams for beam search. 1 means no beam search.
temperature (`float`, *optional*, defaults to `model.config.temperature` or 1.0 if the config does not set
any value): The value used to module the next token probabilities.
top_k (`int`, *optional*, defaults to `model.config.top_k` or 50 if the config does not set any value):
The number of highest probability vocabulary tokens to keep for top-k-filtering.
top_p (`float`, *optional*, defaults to 1.0):
top_p (`float`, *optional*, defaults to `model.config.top_p` or 1.0 if the config does not set any value):
If set to float < 1, only the most probable tokens with probabilities that add up to `top_p` or higher
are kept for generation.
repetition_penalty (`float`, *optional*, defaults to 1.0):
repetition_penalty (`float`, *optional*, defaults to `model.config.repetition_penalty` or 1.0 if the config does not set any value):
The parameter for repetition penalty. 1.0 means no penalty. See [this
paper](https://arxiv.org/pdf/1909.05858.pdf) for more details.
pad_token_id (`int`, *optional*):
pad_token_id (`int`, *optional*, defaults to `model.config.pad_token_id`):
The id of the *padding* token.
bos_token_id (`int`, *optional*):
bos_token_id (`int`, *optional*, defaults to `model.config.bos_token_id`):
The id of the *beginning-of-sequence* token.
eos_token_id (`int`, *optional*):
eos_token_id (`int`, *optional*, defaults to `model.config.eos_token_id`):
The id of the *end-of-sequence* token.
length_penalty (`float`, *optional*, defaults to 1.0):
length_penalty (`float`, *optional*, defaults to `model.config.length_penalty` or 1.0 if the config does not set any value):
Exponential penalty to the length. 1.0 means no penalty.

Set to values < 1.0 in order to encourage the model to generate shorter sequences, to a value > 1.0 in
order to encourage the model to produce longer sequences.
no_repeat_ngram_size (`int`, *optional*, defaults to 0):
no_repeat_ngram_size (`int`, *optional*, defaults to `model.config.no_repeat_ngram_size` or 0 if the config does not set any value):
If set to int > 0, all ngrams of that size can only occur once.
bad_words_ids(`List[int]`, *optional*):
bad_words_ids(`List[int]`, *optional*, defaults to `model.config.bad_words_ids`):
List of token ids that are not allowed to be generated. In order to get the tokens of the words that
should not appear in the generated text, use `tokenizer.encode(bad_word, add_prefix_space=True)`.
num_return_sequences(`int`, *optional*, defaults to 1):
The number of independently computed returned sequences for each element in the batch.
num_return_sequences(`int`, *optional*, defaults to `model.config.num_return_sequences` or 1 if the config does not set any value):
The number of independently computed returned sequences for each element in
the batch.
attention_mask (`tf.Tensor` of `dtype=tf.int32` and shape `(batch_size, sequence_length)`, *optional*):
Mask to avoid performing attention on padding token indices. Mask values are in `[0, 1]`, 1 for tokens
that are not masked, and 0 for masked tokens.
Expand All @@ -479,21 +478,23 @@ def generate(
use_cache: (`bool`, *optional*, defaults to `True`):
Whether or not the model should use the past last key/values attentions (if applicable to the model) to
speed up decoding.
output_attentions (`bool`, *optional*, defaults to `False`):
Whether or not to return the attentions tensors of all attention layers. See `attentions` under
returned tensors for more details.
output_hidden_states (`bool`, *optional*, defaults to `False`):
Whether or not to return the hidden states of all layers. See `hidden_states` under returned tensors
output_attentions (`bool`, *optional*, defaults to `model.config.output_attentions` or `False` if the config does not set any value):
Whether or not to return the attentions tensors of all attention
layers. See `attentions` under returned tensors for more details.
output_hidden_states (`bool`, *optional*, defaults to `model.config.output_hidden_states` or `False` if the config does not set any value):
Whether or not to return the hidden states of all layers. See
`hidden_states` under returned tensors for more details.
output_scores (`bool`, *optional*, defaults to `model.config.output_scores` or `False` if the config does not set any value):
Whether or not to return the prediction scores. See `scores` under returned tensors
for more details.
output_scores (`bool`, *optional*, defaults to `False`):
Whether or not to return the prediction scores. See `scores` under returned tensors for more details.
return_dict_in_generate (`bool`, *optional*, defaults to `False`):
Whether or not to return a [`~utils.ModelOutput`] instead of a plain tuple.
forced_bos_token_id (`int`, *optional*):
return_dict_in_generate (`bool`, *optional*, defaults to `model.config.return_dict_in_generate` or `False` if the config does not set any value):
Whether or not to return a [`~utils.ModelOutput`] instead of a
plain tuple.
forced_bos_token_id (`int`, *optional*, defaults to `model.config.forced_bos_token_id`):
The id of the token to force as the first generated token after the `decoder_start_token_id`. Useful
for multilingual models like [mBART](../model_doc/mbart) where the first generated token needs to be
the target language token.
forced_eos_token_id (`int`, *optional*):
forced_eos_token_id (`int`, *optional*, defaults to `model.config.forced_eos_token_id`):
The id of the token to force as the last generated token when `max_length` is reached.
model_specific_kwargs:
Additional model specific kwargs will be forwarded to the `forward` function of the model.
Expand Down Expand Up @@ -791,7 +792,6 @@ def generate(
attention_mask = tf.gather(attention_mask, expanded_batch_idxs, axis=0)

if self.config.is_encoder_decoder:

# create empty decoder_input_ids
input_ids = (
tf.ones(
Expand Down Expand Up @@ -1071,7 +1071,6 @@ def _generate_beam_search(

# for each sentence
for batch_idx in range(batch_size):

# if we are done with this sentence
if done[batch_idx]:
assert (
Expand Down Expand Up @@ -1336,7 +1335,6 @@ def _generate(
post](https://huggingface.co/blog/how-to-generate).

Parameters:

input_ids (`tf.Tensor` of `dtype=tf.int32` and shape `(batch_size, sequence_length)`, *optional*):
The sequence used as a prompt for the generation. If `None` the method initializes it with
`bos_token_id` and a batch size of 1.
Expand Down Expand Up @@ -1749,7 +1747,6 @@ def _prepare_decoder_input_ids_for_generation(
bos_token_id: int = None,
model_kwargs: Optional[Dict[str, tf.Tensor]] = None,
) -> tf.Tensor:

# prepare `input_ids` for decoder if model is encoder-decoder
if model_kwargs is not None and "decoder_input_ids" in model_kwargs:
return model_kwargs.pop("decoder_input_ids")
Expand Down Expand Up @@ -2069,7 +2066,6 @@ def greedy_search(
Generates sequences for models with a language modeling head using greedy decoding.

Parameters:

input_ids (`tf.Tensor` of shape `(batch_size, sequence_length)`):
The sequence used as a prompt for the generation.
logits_processor (`TFLogitsProcessorList`, *optional*):
Expand Down Expand Up @@ -2322,7 +2318,6 @@ def sample(
Generates sequences for models with a language modeling head using multinomial sampling.

Parameters:

input_ids (`tf.Tensor` of shape `(batch_size, sequence_length)`):
The sequence used as a prompt for the generation.
logits_processor (`TFLogitsProcessorList`, *optional*):
Expand Down Expand Up @@ -2599,7 +2594,6 @@ def beam_search(
Generates sequences for models with a language modeling head using beam search with multinomial sampling.

Parameters:

input_ids (`tf.Tensor` of shape `(batch_size, sequence_length)`):
The sequence used as a prompt for the generation.
max_length (`int`, *optional*, defaults to 20):
Expand Down
Loading