@@ -55,7 +55,7 @@ When you load a model explicitly, you can inspect the generation configuration t
5555>> > from transformers import AutoModelForCausalLM
5656
5757>> > model = AutoModelForCausalLM.from_pretrained(" distilgpt2" )
58- >> > model.generation_config
58+ >> > model.generation_config # doctest: +IGNORE_RESULT
5959GenerationConfig {
6060 " _from_model_config" : true,
6161 " bos_token_id" : 50256 ,
@@ -77,7 +77,7 @@ producing highly repetitive results.
7777You can override any ` generation_config ` by passing the parameters and their values directly to the [ ` generate ` ] method:
7878
7979``` python
80- >> > my_model.generate(** inputs, num_beams = 4 , do_sample = True )
80+ >> > my_model.generate(** inputs, num_beams = 4 , do_sample = True ) # doctest: +SKIP
8181```
8282
8383Even if the default decoding strategy mostly works for your task, you can still tweak a few things. Some of the
@@ -107,11 +107,11 @@ If you would like to share your fine-tuned model with a specific generation conf
107107``` python
108108>> > from transformers import AutoModelForCausalLM, GenerationConfig
109109
110- >> > model = AutoModelForCausalLM.from_pretrained(" my_account/my_model" )
110+ >> > model = AutoModelForCausalLM.from_pretrained(" my_account/my_model" ) # doctest: +SKIP
111111>> > generation_config = GenerationConfig(
112112... max_new_tokens = 50 , do_sample = True , top_k = 50 , eos_token_id = model.config.eos_token_id
113113... )
114- >> > generation_config.save_pretrained(" my_account/my_model" , push_to_hub = True )
114+ >> > generation_config.save_pretrained(" my_account/my_model" , push_to_hub = True ) # doctest: +SKIP
115115```
116116
117117You can also store several generation configurations in a single directory, making use of the ` config_file_name `
@@ -133,14 +133,15 @@ one for summarization with beam search). You must have the right Hub permissions
133133... pad_token = model.config.pad_token_id,
134134... )
135135
136- >> > translation_generation_config.save_pretrained(" t5-small" , " translation_generation_config.json" , push_to_hub = True )
136+ >> > # Tip: add `push_to_hub=True` to push to the Hub
137+ >> > translation_generation_config.save_pretrained(" /tmp" , " translation_generation_config.json" )
137138
138139>> > # You could then use the named generation config file to parameterize generation
139- >> > generation_config = GenerationConfig.from_pretrained(" t5-small " , " translation_generation_config.json" )
140+ >> > generation_config = GenerationConfig.from_pretrained(" /tmp " , " translation_generation_config.json" )
140141>> > inputs = tokenizer(" translate English to French: Configuration files are easy to use!" , return_tensors = " pt" )
141142>> > outputs = model.generate(** inputs, generation_config = generation_config)
142143>> > print (tokenizer.batch_decode(outputs, skip_special_tokens = True ))
143- [' Les fichiers de configuration sont faciles à utiliser !' ]
144+ [' Les fichiers de configuration sont faciles à utiliser!' ]
144145```
145146
146147## Streaming
@@ -217,10 +218,9 @@ The two main parameters that enable and control the behavior of contrastive sear
217218
218219>> > outputs = model.generate(** inputs, penalty_alpha = 0.6 , top_k = 4 , max_new_tokens = 100 )
219220>> > tokenizer.batch_decode(outputs, skip_special_tokens = True )
220- [' Hugging Face Company is a family owned and operated business. \
221- We pride ourselves on being the best in the business and our customer service is second to none.\
222- \n\n If you have any questions about our products or services, feel free to contact us at any time.\
223- We look forward to hearing from you!' ]
221+ [' Hugging Face Company is a family owned and operated business. We pride ourselves on being the best
222+ in the business and our customer service is second to none.\n\nIf you have any questions about our
223+ products or services, feel free to contact us at any time. We look forward to hearing from you!' ]
224224```
225225
226226# ## Multinomial sampling
@@ -233,7 +233,8 @@ risk of repetition.
233233To enable multinomial sampling set `do_sample=True ` and `num_beams=1 ` .
234234
235235```python
236- >> > from transformers import AutoTokenizer, AutoModelForCausalLM
236+ >> > from transformers import AutoTokenizer, AutoModelForCausalLM, set_seed
237+ >> > set_seed(0 ) # For reproducibility
237238
238239>> > checkpoint = " gpt2-large"
239240>> > tokenizer = AutoTokenizer.from_pretrained(checkpoint)
@@ -244,11 +245,8 @@ To enable multinomial sampling set `do_sample=True` and `num_beams=1`.
244245
245246>> > outputs = model.generate(** inputs, do_sample = True , num_beams = 1 , max_new_tokens = 100 )
246247>> > tokenizer.batch_decode(outputs, skip_special_tokens = True )
247- [' Today was an amazing day because we are now in the final stages of our trip to New York City which was very tough. \
248- It is a difficult schedule and a challenging part of the year but still worth it. I have been taking things easier and \
249- I feel stronger and more motivated to be out there on their tour. Hopefully, that experience is going to help them with \
250- their upcoming events which are currently scheduled in Australia.\n\n We love that they are here. They want to make a \
251- name for themselves and become famous for what they' ]
248+ [' Today was an amazing day because when you go to the World Cup and you don\' t, or when you don\' t get invited,
249+ that\'s a terrible feeling."']
252250```
253251
254252# ## Beam-search decoding
@@ -272,7 +270,7 @@ To enable this decoding strategy, specify the `num_beams` (aka number of hypothe
272270
273271>> > outputs = model.generate(** inputs, num_beams = 5 , max_new_tokens = 50 )
274272>> > tokenizer.batch_decode(outputs, skip_special_tokens = True )
275- [' It is astonishing how one can have such a profound impact on the lives of so many people in such a short period of \
273+ [' It is astonishing how one can have such a profound impact on the lives of so many people in such a short period of
276274time." \n\n He added: " I am very proud of the work I have been able to do in the last few years.\n\n"I have']
277275```
278276
@@ -282,7 +280,8 @@ As the name implies, this decoding strategy combines beam search with multinomia
282280the `num_beams` greater than 1 , and set `do_sample=True ` to use this decoding strategy.
283281
284282```python
285- >> > from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
283+ >> > from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, set_seed
284+ >> > set_seed(0 ) # For reproducibility
286285
287286>> > prompt = " translate English to German: The house is wonderful."
288287>> > checkpoint = " t5-small"
@@ -309,20 +308,22 @@ The diversily penalty ensures the outputs are distinct across groups, and beam s
309308>> > from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
310309
311310>> > checkpoint = " google/pegasus-xsum"
312- >> > prompt = " The Permaculture Design Principles are a set of universal design principles \
313- >>> that can be applied to any location, climate and culture, and they allow us to design \
314- >>> the most efficient and sustainable human habitation and food production systems. \
315- >>> Permaculture is a design system that encompasses a wide variety of disciplines, such \
316- >>> as ecology, landscape design, environmental science and energy conservation, and the \
317- >>> Permaculture design principles are drawn from these various disciplines. Each individual \
318- >>> design principle itself embodies a complete conceptual framework based on sound \
319- >>> scientific principles. When we bring all these separate principles together, we can \
320- >>> create a design system that both looks at whole systems, the parts that these systems \
321- >>> consist of, and how those parts interact with each other to create a complex, dynamic, \
322- >>> living system. Each design principle serves as a tool that allows us to integrate all \
323- >>> the separate parts of a design, referred to as elements, into a functional, synergistic, \
324- >>> whole system, where the elements harmoniously interact and work together in the most \
325- >>> efficient way possible."
311+ >> > prompt = (
312+ ... " The Permaculture Design Principles are a set of universal design principles "
313+ ... " that can be applied to any location, climate and culture, and they allow us to design "
314+ ... " the most efficient and sustainable human habitation and food production systems. "
315+ ... " Permaculture is a design system that encompasses a wide variety of disciplines, such "
316+ ... " as ecology, landscape design, environmental science and energy conservation, and the "
317+ ... " Permaculture design principles are drawn from these various disciplines. Each individual "
318+ ... " design principle itself embodies a complete conceptual framework based on sound "
319+ ... " scientific principles. When we bring all these separate principles together, we can "
320+ ... " create a design system that both looks at whole systems, the parts that these systems "
321+ ... " consist of, and how those parts interact with each other to create a complex, dynamic, "
322+ ... " living system. Each design principle serves as a tool that allows us to integrate all "
323+ ... " the separate parts of a design, referred to as elements, into a functional, synergistic, "
324+ ... " whole system, where the elements harmoniously interact and work together in the most "
325+ ... " efficient way possible."
326+ ... )
326327
327328>> > tokenizer = AutoTokenizer.from_pretrained(checkpoint)
328329>> > inputs = tokenizer(prompt, return_tensors = " pt" )
@@ -369,7 +370,8 @@ When using assisted decoding with sampling methods, you can use the `temperarure
369370just like in multinomial sampling. However, in assisted decoding, reducing the temperature will help improving latency.
370371
371372```python
372- >> > from transformers import AutoModelForCausalLM, AutoTokenizer
373+ >> > from transformers import AutoModelForCausalLM, AutoTokenizer, set_seed
374+ >> > set_seed(42 ) # For reproducibility
373375
374376>> > prompt = " Alice and Bob"
375377>> > checkpoint = " EleutherAI/pythia-1.4b-deduped"
@@ -382,5 +384,5 @@ just like in multinomial sampling. However, in assisted decoding, reducing the t
382384>> > assistant_model = AutoModelForCausalLM.from_pretrained(assistant_checkpoint)
383385>> > outputs = model.generate(** inputs, assistant_model = assistant_model, do_sample = True , temperature = 0.5 )
384386>> > tokenizer.batch_decode(outputs, skip_special_tokens = True )
385- [" Alice and Bob are sitting on the sofa. Alice says, 'I'm going to my room " ]
387+ [' Alice and Bob are going to the same party. It is a small party, in a small ' ]
386388```
0 commit comments