diff --git a/zh/_blog.yml b/zh/_blog.yml
index fff98b4fa5..443a06dfbb 100644
--- a/zh/_blog.yml
+++ b/zh/_blog.yml
@@ -470,4 +470,27 @@
   tags:
     - nlp
     - community
-    - research 
+    - research
+
+- local: text-to-video
+  title: "深入理解文生视频模型"
+  author: adirik
+  thumbnail: /blog/assets/140_text-to-video/thumbnail.png
+  date: May 8, 2023
+  tags:
+    - multi-modal
+    - cv
+    - guide
+    - diffusion
+    - text-to-image
+    - text-to-video
+
+- local: introducing-csearch
+  title: "在 Transformers 中使用对比搜索生成可媲美人类水平的文本🤗"
+  author: yxuansu
+  thumbnail: /blog/assets/115_introducing_contrastive_search/thumbnail.png
+  date: Nov 8, 2022
+  tags:
+    - nlp
+    - text generation
+    - research
\ No newline at end of file
diff --git a/zh/introducing-csearch.md b/zh/introducing-csearch.md
new file mode 100644
index 0000000000..0b6ea28e02
--- /dev/null
+++ b/zh/introducing-csearch.md
@@ -0,0 +1,571 @@
+---
+title: "在 Transformers 中使用对比搜索生成可媲美人类水平的文本🤗"
+thumbnail: /blog/assets/115_introducing_contrastive_search/thumbnail.png
+authors:
+- user: GMFTBY
+translators:
+- user: MatrixYao
+---
+
+<h1>在 Transformers 中使用对比搜索生成可媲美人类水平的文本🤗</h1>
+
+<!-- {blog_metadata} -->
+<!-- {authors} -->
+
+****
+
+<a target="_blank" href="https://colab.research.google.com/github/huggingface/blog/blob/main/notebooks/115_introducing_contrastive_search.ipynb">
+    <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
+</a>
+
+### 1. 引言
+
+自然语言生成（即文本生成）是自然语言处理（NLP）的核心任务之一。本文将介绍神经网络文本生成领域当前最先进的解码方法**对比搜索（Contrastive Search）**。提出该方法的论文 *"A Contrastive Framework for Neural Text Generation"* 最初发表于 NeurIPS 2022（[[论文]](https://arxiv.org/abs/2202.06417)、[[官方实现]](https://github.com/yxuansu/SimCTG)）。此后，*"Contrastive Search Is What You Need For Neural Text Generation"* 的作者又进一步证明了对比搜索可以用**现有的**语言模型在 **16** 种语言上生成可媲美人类水平的文本（[[论文]](https://arxiv.org/abs/2210.14140)、[[官方实现]](https://github.com/yxuansu/Contrastive_Search_Is_What_You_Need)）。
+
+**[备注]** 对于不熟悉文本生成的用户，请参阅[此博文](https://huggingface.co/blog/how-to-generate)了解更多详情。
+
+****
+
+<span id='demo'/>
+
+### 2. Hugging Face 🤗 对比搜索演示
+
+目前，🤗 `transformers` 的 PyTorch 和 TensorFlow 后端均支持对比搜索。你可以在[该 Colab notebook](https://colab.research.google.com/github/huggingface/blog/blob/main/notebooks/115_introducing_contrastive_search.ipynb) 中根据不同的后端选择相应的部分来探索该方法，文章顶部也有该 notebook 链接。我们还构建了这个不错的[演示应用](https://huggingface.co/spaces/joaogante/contrastive_search_generation)，用它可以直观地比较对比搜索与其他流行的解码方法（例如波束搜索、top-k 采样<a href='#references'>[3]</a>以及核采样<a href='#references'>[4]</a>）。
+
+****
+
+<span id='installation'/>
+
+### 3. 环境安装
+
+在进行后续实验前，我们要先安装最新的 `transformers` 库，如下：
+
+```shell
+pip install torch
+pip install "transformers==4.24.0"
+```
+
+****
+
+<span id='problems_of_decoding_methods'/>
+
+### 4. 现有解码方法存在的问题
+
+解码方法可以分为两类：（i）确定性方法，（ii）随机方法。下面我们分别对两者进行讨论！
+
+<span id='deterministic_methods'/>
+
+#### 4.1. 确定性方法
+
+确定性方法，如贪心搜索和波束搜索，通过在语言模型输出的所有候选补全词中选择概率最高的词来生成最终文本。然而，正如之前研究 <a href='#references'>[3]</a><a href='#references'>[4]</a> 指出的，确定性方法通常会导致*模型退化*，即生成的文本不自然且包含不必要的重复。
+
+下面，我们看一个用 GPT-2 模型和贪心搜索生成文本的例子。
+
+```python
+from transformers import AutoTokenizer, GPT2LMHeadModel
+
+tokenizer = AutoTokenizer.from_pretrained('gpt2-large')
+input_ids = tokenizer('DeepMind Company is', return_tensors='pt').input_ids
+model = GPT2LMHeadModel.from_pretrained('gpt2-large')
+
+output = model.generate(input_ids, max_length=128)
+print("Output:\n" + 100 * '-')
+print(tokenizer.decode(output[0], skip_special_tokens=True))
+print("" + 100 * '-')
+```
+
+<details open>
+<summary><b>模型输出：</b></summary>
+
+```
+Output:
+----------------------------------------------------------------------------------------------------
+DeepMind Company is a leading AI research company, with a focus on deep learning and deep learning-based systems.
+
+The company's research is focused on the development of deep learning-based systems that can learn from large amounts of data, and that can be used to solve real-world problems.
+
+DeepMind's research is also used by the UK government to develop new technologies for the UK's National Health Service.
+
+DeepMind's research is also used by the UK government to develop new technologies for the UK's National Health Service.
+
+DeepMind's research is also used by the UK government to develop new technologies
+----------------------------------------------------------------------------------------------------
+```
+</details>
+
+**[备注]** 我们可以看到，贪心搜索生成的结果中有明显的重复。
+
+<span id='stochastic_methods'/>
+
+#### 4.2. 随机方法
+
+为了解决确定性方法带来的问题，随机方法通过在解码过程中引入随机性来生成文本。常用的两种随机方法是 (i) top-k 采样<a href='#references'>[3]</a> 和 (ii) 核采样（也称为 top-p 采样）<a href='#references'>[4]</a>。
+
+下面，我们给出用 GPT-2 模型和核采样 (p=0.95) 生成文本的示例。
+
+```python
+import torch
+from transformers import AutoTokenizer, GPT2LMHeadModel
+
+tokenizer = AutoTokenizer.from_pretrained('gpt2-large')
+input_ids = tokenizer('DeepMind Company is', return_tensors='pt').input_ids
+model = GPT2LMHeadModel.from_pretrained('gpt2-large')
+
+torch.manual_seed(0.)
+output = model.generate(input_ids, do_sample=True, max_length=128, top_p=0.95, top_k=0)
+print("Output:\n" + 100 * '-')
+print(tokenizer.decode(output[0], skip_special_tokens=True))
+print("" + 100 * '-')
+```
+
+<details open>
+<summary><b>模型输出：</b></summary>
+
+```
+Output:
+----------------------------------------------------------------------------------------------------
+DeepMind Company is a leading provider of AI-based research, development, and delivery of AI solutions for security, infrastructure, machine learning, communications, and so on."
+
+'AI is not journalism'
+
+Worse still was the message its researchers hoped would reach the world's media — that it was not really research, but rather a get-rich-quick scheme to profit from living forces' ignorance.
+
+"The thing is, we know that people don't consciously assess the value of the others'
+information. They understand they will get the same on their own."
+
+One example? Given the details of today
+----------------------------------------------------------------------------------------------------
+```
+</details>
+
+**[备注]** 虽然核采样可以生成没有重复的文本，但生成文本的语义一致性并不是很好。例如，生成的短语 *'AI is not journalism'* 与给定的上文即 *'DeepMind Company'* 不一致。
+
+我们注意到，这种语义不一致的问题可以通过降低温度（temperature）来部分解决。然而，降低温度会使核采样更接近贪心搜索，这其实就变成了贪心搜索和核采样之间的权衡。一般来讲，要找到一个既能避免贪心搜索又能避免核采样陷阱的快捷且与模型无关的温度相当有挑战。
+
+****
+
+<span id='contrastive_search'/>
+
+### 5. 对比搜索
+
+本节我们来详细介绍一种新的解码方法，***对比搜索***。
+
+<span id='contrastive_objective'/>
+
+#### 5.1. 解码目标
+
+给定前缀文本 $x_{< t}$，我们按如下公式选择输出词元 $x_{t}$:
+<center class="half">
+    <img src="/blog/assets/115_introducing_contrastive_search/formulation.png" width="750"/>
+</center>
+
+上式中， $V^{(k)}$ 是语言模型输出概率分布 $p_{\theta}(v|x_{< t})$ 中 k 个概率最大的候选词元的集合。第一项，即 *模型置信度（model confidence）*，是语言模型预测的每个候选词元 $v$ 的概率。第二项，*退化惩罚（degeneration penalty）*，用于度量 $v$ 与上文 $x_{< t}$ 中每个词元的相异度，其中函数 $s(\cdot, \cdot)$ 用于计算每两个词元间的余弦相似度。更具体地说，退化惩罚被定义为 $v$ 的向量表征 $h_{v}$ 与其上文 $x_ {< t}$ 中每个词元的向量表征间余弦相似度的最大值。这里，候选词元的向量表征 $h_{v}$ 是在给定 $x_{< t}$ 和 $v$ 的条件下将二者连接起来输入给语言模型，然后由语言模型计算出来的。直观上，如果 $v$ 的退化惩罚较大意味着它与上文更相似（在表示空间中），因此更有可能导致模型退化问题。超参数 $\alpha$ 用于在这两项中折衷。当 $\alpha=0$ 时，对比搜索退化为纯贪心搜索。
+
+**[备注]** 在生成输出时，对比搜索同时考虑（i）语言模型预测的概率，以保持生成文本和前缀文本之间的语义连贯性； (ii) 与上文的相似性以避免模型退化。
+
+<span id='contrastive_generation'/>
+
+#### 5.2. 使用对比搜索生成文本
+
+下面，我们使用与<a href='#deterministic_methods'>第 4.1 节</a> 和<a href='#stochastic_methods'>第 4.2 节</a>中相同的前缀文本（即 *“DeepMind Company is”* ），并使用对比搜索生成文本（取 k=4、$\alpha=0.6$）。为了充分展示对比搜索的卓越能力，我们让语言模型生成一个 **512** 词元的**长**文档，如下：
+
+```python
+from transformers import GPT2Tokenizer, GPT2LMHeadModel
+
+model_name = 'gpt2-large'
+tokenizer = GPT2Tokenizer.from_pretrained(model_name)
+model = GPT2LMHeadModel.from_pretrained(model_name, pad_token_id=tokenizer.eos_token_id)
+model.eval()
+
+# prepare the prefix
+prefix_text = r'DeepMind Company is'
+input_ids = tokenizer(prefix_text, return_tensors='pt').input_ids
+
+# generate the result with contrastive search
+output = model.generate(input_ids, penalty_alpha=0.6, top_k=4, max_length=512)
+print("Output:\n" + 100 * '-')
+print(tokenizer.decode(output[0], skip_special_tokens=True))
+print("" + 100 * '-')
+```
+
+参数设置如下：
+* `--top_k`：对比搜索中的超参 $k$。
+* `--penalty_alpha`：对比搜索中的超参 $\alpha$。
+
+<details open>
+<summary><b>模型输出：</b></summary>
+
+```
+Output:
+----------------------------------------------------------------------------------------------------
+DeepMind Company is a leader in artificial intelligence (AI). We have a long history of working with companies such as Google, Facebook, Amazon, and Microsoft to build products that improve people's lives, and today we are excited to announce that DeepMind's AlphaGo program has won the game of Go, becoming the first program to defeat a professional Go player.
+
+The victory is a testament to the power of deep learning, and to the incredible work of our research team, which has been at the forefront of AI research for the past five years. AlphaGo is one of the most advanced Go programs ever created, and its performance is an important step towards the goal of human-level AI.
+
+"This is the culmination of a decade of hard work," said Andy Ng, co-founder and CTO of DeepMind. "We are thrilled to have achieved this milestone and look forward to continuing to develop AI that can be used in a wide range of applications and to help people live better lives."
+
+DeepMind's work on Go began in 2010, when it began to train a neural network to play Go using millions of games played by top Go players around the world. Since then, the team has refined the algorithm, adding more and more layers of reinforcement learning to make it better at recognizing patterns and making decisions based on those patterns. In the past year and a half, the team has made significant progress in the game, winning a record-tying 13 games in a row to move into the top four of the world rankings.
+
+"The game of Go is a complex game in which players have to be very careful not to overextend their territory, and this is something that we have been able to improve over and over again," said Dr. Demis Hassabis, co-founder and Chief Scientific Officer of DeepMind. "We are very proud of our team's work, and we hope that it will inspire others to take the next step in their research and apply the same techniques to other problems."
+
+In addition to the win in Go, DeepMind has also developed an AI system that can learn to play a number of different games, including poker, Go, and chess. This AI system, called Tarsier, was developed in partnership with Carnegie Mellon University and the University of California, Berkeley, and is being used to teach computer vision and machine learning to identify objects in images and recognize speech in natural language. Tarsier has been trained to play the game of Go and other games on a
+----------------------------------------------------------------------------------------------------
+```
+</details>
+
+**[备注]** 我们看到生成的文本质量非常高。整个文档语法流畅，语义连贯。同时，生成的文本也很好地保持了事实的正确性。例如，在第一段中，它正确阐述了 *“AlphaGo”* 作为 *“第一个击败职业围棋选手的程序”* 这一事实。
+
+
+<span id='contrastive_visual_demonstration'/>
+
+#### 5.3. 对比搜索的结果可视化
+
+为了更好地理解对比搜索的工作原理，我们对贪心搜索 (<a href='#deterministic_methods'>第 4.1 节</a>) 和对比搜索进行了直观比较。具体来说，我们分别将贪心搜索和对比搜索生成的词元相似度矩阵可视化。两个词元之间的相似性被定义为它们的向量表征（即最后一个转换器层的隐藏状态）之间的余弦相似性。贪心搜索（上）和对比搜索（下）的结果如下图所示。
+
+<center class="half">
+    <img src="/blog/assets/115_introducing_contrastive_search/greedy_search_visualization.png" width="400"/>
+    <img src="/blog/assets/115_introducing_contrastive_search/contrastive_search_visualization.png" width="400"/>
+</center>
+
+**[备注]** 从贪心搜索的结果中，我们看到非对角线的相似度很高，这清楚地表明贪心搜索产生了重复。相反，在对比搜索的结果中，高相似度分数主要出现在对角线上，这证明我们成功解决了退化问题。对比搜索的这一优良特性是通过在解码过程中引入退化惩罚（参见<a href='#contrastive_objective'>第 5.1 节</a>）来实现的。
+
+****
+
+<span id='more_examples'/>
+
+### 6. 更多的生成示例
+
+在本节中，我们提供了更多的生成示例来比较不同的解码方法。
+
+<span id='gpt2_example_one'/>
+
+#### 6.1. 示例一：GPT-2
+
+在这部分中，我们使用 GPT-2 生成文本，其前缀文本来自 [OpenAI 发布 GPT-2 的博客文章](https://openai.com/blog/better-language-models/)。
+
+> _In a shocking finding, scientist discovered a herd of unicorns living in a remote, previously unexplored valley, in the Andes Mountains. Even more surprising to the researchers was the fact that the unicorns spoke perfect English._
+
+
+<details open>
+<summary><b> 加载语言模型并准备前缀文本：</b></summary>
+
+```python
+import torch
+from transformers import AutoTokenizer, GPT2LMHeadModel
+
+tokenizer = AutoTokenizer.from_pretrained('gpt2-large')
+model = GPT2LMHeadModel.from_pretrained('gpt2-large')
+
+prefix_text = r"In a shocking finding, scientist discovered a herd of unicorns living in a remote, previously unexplored valley, in the Andes Mountains. Even more surprising to the researchers was the fact that the unicorns spoke perfect English."
+input_ids = tokenizer(prefix_text, return_tensors='pt').input_ids
+```
+</details>
+
+<span id='gpt2_greedy_example_one'/>
+
+##### 6.1.1. 使用贪心搜索生成文本
+
+<details>
+<summary><b>代码：[点击展开]</b></summary>
+
+```python
+output = model.generate(input_ids, max_length=512)
+print("Output:\n" + 100 * '-')
+print(tokenizer.decode(output[0], skip_special_tokens=True))
+print("" + 100 * '-')
+```
+</details>
+
+<details>
+<summary><b>模型输出：[点击展开]</b></summary>
+
+```
+Output:
+----------------------------------------------------------------------------------------------------
+In a shocking finding, scientist discovered a herd of unicorns living in a remote, previously unexplored valley, in the Andes Mountains. Even more surprising to the researchers was the fact that the unicorns spoke perfect English.
+
+The researchers, led by Dr. David R. Williams of the University of California, Santa Cruz, discovered the unicorns in the Andes Mountains of Peru. The area is known for its unique geology and is home to a number of rare species of animals.
+
+The researchers found the unicorns in the Andes Mountains of Peru.
+
+"We were surprised to find that the unicorns were able to communicate with each other," Williams said. "We were also surprised to find that they were able to communicate in English."
+
+The researchers believe that the unicorns are descendants of the ancient Incas, who lived in the area around 2,000 years ago.
+
+"The Incas were the first people to use the Andes Mountains as a refuge from the harsh climate of the Andes," Williams said. "They were also the first people to use the Andes Mountains as a place to hunt and gather food."
+
+The researchers believe that the unicorns are descendants of the ancient Incas, who lived in the area around 2,000 years ago.
+
+"The Incas were the first people to use the Andes Mountains as a refuge from the harsh climate of the Andes," Williams said. "They were also the first people to use the Andes Mountains as a place to hunt and gather food."
+
+The researchers believe that the unicorns are descendants of the ancient Incas, who lived in the area around 2,000 years ago.
+
+"The Incas were the first people to use the Andes Mountains as a refuge from the harsh climate of the Andes," Williams said. "They were also the first people to use the Andes Mountains as a place to hunt and gather food."
+
+The researchers believe that the unicorns are descendants of the ancient Incas, who lived in the area around 2,000 years ago.
+
+"The Incas were the first people to use the Andes Mountains as a refuge from the harsh climate of the Andes," Williams said. "They were also the first people to use the Andes Mountains as a place to hunt and gather food."
+
+The researchers believe that the unicorns are descendants of the ancient Incas, who lived in the area around 2,000 years ago
+----------------------------------------------------------------------------------------------------
+```
+</details>
+
+<span id='gpt2_nucleus_example_one'/>
+
+##### 6.1.2. 使用核采样生成文本
+
+<details>
+<summary><b>代码：[点击展开]</b></summary>
+
+```python
+torch.manual_seed(0.)
+output = model.generate(input_ids, do_sample=True, max_length=512, top_p=0.95, top_k=0)
+print("Output:\n" + 100 * '-')
+print(tokenizer.decode(output[0], skip_special_tokens=True))
+print("" + 100 * '-')
+```
+</details>
+
+
+<details>
+<summary><b>模型输出：[点击展开]</b></summary>
+
+```
+Output:
+----------------------------------------------------------------------------------------------------
+In a shocking finding, scientist discovered a herd of unicorns living in a remote, previously unexplored valley, in the Andes Mountains. Even more surprising to the researchers was the fact that the unicorns spoke perfect English. The study was published in the Journal of Zoology in March 2016.
+
+Polygynous mammals such as unicorns have remained largely unknown to science. Professor Gustavo Giacota, from the University of Oxford who led the study, said that they had been documented as far as Eastern Siberia in Russia, but had only been seen a handful of times in the Gobi Desert.
+
+Tiny animals with pale and shiny coats live in the presence of human beings and are hardly likely to be victims of any cruelty. However, there is some evidence of the condition occurring in both humans and animals in remote regions, which might have similarities to "black moles" that coexist on the skin.
+
+It is thought that Unicorns could be inside themselves, that they have different scents depending on their current environment, or just fall out and there are plenty of legends of how they have survived. Experts speculate that the moths and other animals could be remnants of the Yezidi Isis and Charon, which literally is both the word which means great bird, and the Greek word for sound. It is said that the Isis and Charon taught their young the use of voice in the form of calling out to others.
+
+The scientists think that it could be ancient folklore that has survived and is no longer attributed to a real entity
+----------------------------------------------------------------------------------------------------
+```
+</details>
+
+
+<span id='gpt2_contrastive_example_one'/>
+
+##### 6.1.3. 使用对比搜索生成文本
+
+<details open>
+<summary><b>代码：</b></summary>
+
+```python
+output = model.generate(input_ids, max_length=512, penalty_alpha=0.6, top_k=4)
+print("Output:\n" + 100 * '-')
+print(tokenizer.decode(output[0], skip_special_tokens=True))
+print("" + 100 * '-')
+```
+</details>
+
+<details open>
+<summary><b>模型输出：</b></summary>
+
+```
+Output:
+----------------------------------------------------------------------------------------------------
+In a shocking finding, scientist discovered a herd of unicorns living in a remote, previously unexplored valley, in the Andes Mountains. Even more surprising to the researchers was the fact that the unicorns spoke perfect English.
+
+According to the BBC, a team of scientists led by Dr David MacKay, from the University of Bristol, spent two years searching for the unicorn herd, which they discovered during a survey of the area.
+
+"It's a very rare find," MacKay told the BBC. "There are a few in the Himalayas, but this is the first time we've been able to find one in such a remote area."
+
+The team was surprised to find a herd of unicorns living in a region that has been known to be a hotbed of poaching, with many of the animals poached for their horns, which are used in traditional Chinese medicine to treat everything from rheumatism to cancer.
+
+"We knew that the area was rich in rhino horn, but we had no idea how many there were, or what they were doing there," MacKay said. "This is an area of high poaching pressure, and we wanted to find out what was going on."
+
+In order to do so, the team used GPS collars to track the animals as they moved around the mountain and the surrounding area. The GPS data was then compared with information gathered from local villagers, who had a wealth of information about the animals' movements, including where they were eating, what they were doing at night, and how much time they spent in the mountains each day.
+
+After analyzing the data, the team determined that the herd consisted of at least three species of unicorns, including a male and two females. One of the females was the mother of the male, and the other two were her daughters. All three had the same horn color, which is believed to be a sign of purity in the animal kingdom.
+
+While the discovery is exciting, it's not the first time scientists have discovered an animal that speaks English. Last year, scientists discovered a species of porcupine that can be heard by humans, and has been dubbed "Porcupine Man" for his ability to converse with the human race.
+----------------------------------------------------------------------------------------------------
+```
+</details>
+
+
+<span id='opt_example_two'/>
+
+#### 6.2. 示例二：OPT
+
+本节中，我们使用 Meta 最近发布的 OPT 模型 <a href='#references'>[5]</a>，并从著名的 ResNet 论文 <a href='#references'>[6]</a> 摘要中选取前两个句子来作为前缀文本。
+
+> Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously.
+
+
+<details open>
+<summary><b> 加载语言模型并准备前缀文本：</b></summary>
+
+```python
+import torch
+from transformers import AutoTokenizer, OPTForCausalLM
+model_name = r'facebook/opt-1.3b'
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+model = OPTForCausalLM.from_pretrained(model_name)
+
+prefix_text = r"Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously."
+input_ids = tokenizer(prefix_text, return_tensors='pt').input_ids
+```
+</details>
+
+
+
+<span id='opt_greedy_example_two'/>
+
+##### 6.2.1. 使用贪心搜索生成文本
+
+<details>
+<summary><b>代码：[点击展开]</b></summary>
+
+```python
+output = model.generate(input_ids, max_length=256)
+print("Output:\n" + 100 * '-')
+print(tokenizer.decode(output[0], skip_special_tokens=True))
+print("" + 100 * '-')
+```
+</details>
+
+<details>
+<summary><b>模型输出：[点击展开]</b></summary>
+
+```
+Output:
+----------------------------------------------------------------------------------------------------
+Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We show that the residual learning framework can be used to train deep neural networks that are significantly more difficult to train than those used previously. We also show that the residual learning framework can be used to train deep neural networks that are significantly more difficult to train than those used previously.
+
+The paper presents a new residual learning framework for deep neural networks that is based on the concept of residuals. The residuals are the residuals of the network that are not used in the training process. The residuals are computed by taking the residuals of the network that are used in the training process and subtracting the residuals of the network that are not used in the training process. The residuals are then used to train the network. The residuals are computed by taking the residuals of the network that are used in the training process and subtracting the residuals of the network that are not used in the training process. The residuals are then used to train the network. The residuals are computed by taking the residuals of the network that are used in the training process and subtracting the residuals of the
+----------------------------------------------------------------------------------------------------
+```
+</details>
+
+
+<span id='opt_greedy_example_two'/>
+
+##### 6.2.2. 使用核采样生成文本
+
+<details>
+<summary><b>代码：[点击展开]</b></summary>
+
+```python
+torch.manual_seed(0.)
+output = model.generate(input_ids, do_sample=True, max_length=256, top_p=0.95, top_k=0)
+print("Output:\n" + 100 * '-')
+print(tokenizer.decode(output[0], skip_special_tokens=True))
+print("" + 100 * '-')
+```
+</details>
+
+
+<details>
+<summary><b>模型输出：[点击展开]</b></summary>
+
+```
+Output:
+----------------------------------------------------------------------------------------------------
+Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. The theory focuses on several aspects of learning, including the dynamics of replicative and non-replicative aspects of learning. This framework emphasizes learning by entropy. New randomized algorithms enable training networks with residual learning, so that deep networks can be deployed as reliably and as efficiently as their more conventional counterparts.
+----------------------------------------------------------------------------------------------------
+```
+</details>
+
+<span id='opt_contrastive_example_two'/>
+
+##### 6.2.3. 使用对比搜索生成文本
+
+<details open>
+<summary><b>代码：</b></summary>
+
+```python
+output = model.generate(input_ids, max_length=256, penalty_alpha=0.6, top_k=6)
+print("Output:\n" + 100 * '-')
+print(tokenizer.decode(output[0], skip_special_tokens=True))
+print("" + 100 * '-')
+```
+</details>
+
+<details open>
+<summary><b>模型输出：</b></summary>
+
+```
+Output:
+----------------------------------------------------------------------------------------------------
+Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously.
+
+In this paper, we propose a model-based residual learning (MBRL) framework that is based on neural networks trained on data that is sparse in terms of dimensionality (e.g., 1, 2, 3, etc.). The network parameters are chosen such that there is a high probability of convergence, i.e., the number of iterations is large enough to minimize the variance of the residuals. This is achieved by training the network on a set of training data, in which the data is sparse in terms of dimensionality, and then discarding the nonparametric part of the data after training is complete.
+
+We show that MBRL outperforms other methods for deep reinforcement learning (RL) and deep convolutional neural networks (CNNs) by a factor of at least 2. In addition, we show that, compared to CNNs, MBRL performs better in two-dimensional (2D) and three-dimensional (3D) cases.
+----------------------------------------------------------------------------------------------------
+```
+</details>
+
+****
+
+<span id='resources'/>
+
+### 7. 更多资源
+
+有关对比搜索的更多详细信息，请查看我们的论文和代码，如下：
+* **A Contrastive Framework for Neural Text Generation**: [论文](https://arxiv.org/abs/2202.06417)、[官方实现](https://github.com/yxuansu/SimCTG)
+* **Contrastive Search Is What You Need For Neural Text Generation**: [论文](https://arxiv.org/abs/2210.14140)、[官方实现](https://github.com/yxuansu/Contrastive_Search_Is_What_You_Need)
+
+****
+
+<span id='citation'/>
+
+### 8. 引用
+
+```bibtex
+@inproceedings{su2022a,
+   title={A Contrastive Framework for Neural Text Generation},
+   author={Yixuan Su and Tian Lan and Yan Wang and Dani Yogatama and Lingpeng Kong and Nigel Collier},
+   booktitle={Advances in Neural Information Processing Systems},
+   editor={Alice H. Oh and Alekh Agarwal and Danielle Belgrave and Kyunghyun Cho},
+   year={2022},
+   url={https://openreview.net/forum?id=V88BafmH9Pj}
+}
+
+@article{su2022contrastiveiswhatyouneed,
+  title={Contrastive Search Is What You Need For Neural Text Generation},
+  author={Su, Yixuan and Collier, Nigel},
+  journal={arXiv preprint arXiv:2210.14140},
+  year={2022}
+}
+```
+
+****
+
+<span id='references'/>
+
+## 参考文献
+> [1] Su et al., 2022 ["A Contrastive Framework for Neural Text Generation"](https://arxiv.org/abs/2202.06417), NeurIPS 2022
+
+> [2] Su and Collier, 2022 ["Contrastive Search Is What You Need For Neural Text Generation"](https://arxiv.org/abs/2210.14140), Arxiv 2022
+
+> [3] Fan et al., 2018 ["Hierarchical Neural Story Generation"](https://arxiv.org/abs/1805.04833), ACL 2018
+
+> [4] Holtzman et al., 2020 ["The Curious Case of Neural Text Degeneration"](https://arxiv.org/abs/1904.09751), ICLR 2020
+
+> [5] Zhang et al., 2022 ["OPT: Open Pre-trained Transformer Language Models"](https://arxiv.org/abs/2205.01068), Arxiv 2022
+
+> [6] He et al., 2016 ["Deep Residual Learning for Image Recognition"](https://arxiv.org/abs/1512.03385), CVPR 2016
+
+****
+
+*- 本文由 Yixuan Su 和 Tian Lan 撰写*
+
+****
+
+<span id='acknowledgements'/>
+
+
+## 致谢
+
+我们要感谢 Joao Gante（[@joaogante](https://huggingface.co/joaogante)）、Patrick von Platen（[@patrickvonplaten](https://huggingface.co/patrickvonplaten)）和 Sylvain Gugger ([@sgugger](https://github.com/sgugger)），感谢他们在我们将本文中的对比搜索集成进 `transformers` 库的过程中给予的帮助和指导。
+
+> 英文原文: <url> https://huggingface.co/blog/introducing-csearch </url>
+> 原文作者：Tian Lan
+> 译者: Matrix Yao (姚伟峰)，英特尔深度学习工程师，工作方向为 transformer-family 模型在各模态数据上的应用及大规模模型的训练推理。
diff --git a/zh/text-to-video.md b/zh/text-to-video.md
new file mode 100644
index 0000000000..0b3a746622
--- /dev/null
+++ b/zh/text-to-video.md
@@ -0,0 +1,130 @@
+---
+title: "深入理解文生视频模型"
+thumbnail: /blog/assets/140_text-to-video/thumbnail.png
+authors:
+- user: adirik
+translators:
+- user: MatrixYao
+---
+
+<h1>文生视频：任务、挑战及现状</h1>
+
+<!-- {blog_metadata} -->
+<!-- {authors} -->
+
+<p align="center">
+    <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/140_text-to-video/text-to-video-samples.gif" alt="video-samples"><br>
+    <em>示例视频由 <a href=https://modelscope.cn/models/damo/text-to-video-synthesis/summary>ModelScope 生成</a>。</em>
+</p>
+
+最近生成模型方向的进展如排山倒海，令人目不暇接，而文生视频将是这一连串进展的下一波。尽管大家很容易从字面上理解文生视频的意思，但它其实是一项相当新的计算机视觉任务，其要求是根据文本描述生成一系列时间和空间上都一致的图像。虽然看上去这项任务与文生图极其相似，但众所周知，它的难度要大得多。这些模型是如何工作的，它们与文生图模型有何不同，我们对其性能又有何期待？
+
+在本文中，我们将讨论文生视频模型的过去、现在和未来。我们将从回顾文生视频和文生图任务之间的差异开始，并讨论无条件视频生成和文生视频两个任务各自的挑战。此外，我们将介绍文生视频模型的最新发展，探索这些方法的工作原理及其性能。最后，我们将讨论我们在 Hugging Face 所做的工作，这些工作的目标就是促进这些模型的集成和使用，我们还会分享一些在 Hugging Face Hub 上以及其他一些地方的很酷的演示应用及资源。
+
+<p align="center">
+    <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/140_text-to-video/make-a-video.png" alt="samples"><br>
+    <em>根据各种文本描述输入生成的视频示例，图片来自论文 <a href=https://arxiv.org/abs/2209.14792>Make-a-Video</a> 。</em>
+</p>
+
+## 文生视频与文生图
+
+最近文生图领域的进展多如牛毛，大家可能很难跟上最新的进展。因此，我们先快速回顾一下。
+
+就在两年前，第一个支持开放词汇（open-vocabulary）的高质量文生图模型出现了。第一波文生图模型，包括 VQGAN-CLIP、XMC-GAN 和 GauGAN2，都采用了 GAN 架构。紧随其后的是 OpenAI 在 2021 年初发布的广受欢迎的基于 transformer 的 DALL-E、2022 年 4 月的 DALL-E 2，以及由 Stable Diffusion 和 Imagen 开创的新一波扩散模型。Stable Diffusion 的巨大成功催生了许多产品化的扩散模型，例如 DreamStudio 和 RunwayML GEN-1；同时也催生了一批集成了扩散模型的产品，例如 Midjourney。
+
+尽管扩散模型在文生图方面的能力令人印象深刻，但相同的故事并没有扩展到文生视频，不管是扩散文生视频模型还是非扩散文生视频模型的生成能力仍然非常受限。文生视频模型通常在非常短的视频片段上进行训练，这意味着它们需要使用计算量大且速度慢的滑动窗口方法来生成长视频。因此，众所周知，训得的模型难以部署和扩展，并且在保证上下文一致性和视频长度方面很受限。
+
+文生视频的任务面临着多方面的独特挑战。主要有：
+
+- 计算挑战：确保帧间空间和时间一致性会产生长期依赖性，从而带来高计算成本，使得大多数研究人员无法负担训练此类模型的费用。
+- 缺乏高质量的数据集：用于文生视频的多模态数据集很少，而且通常数据集的标注很少，这使得学习复杂的运动语义很困难。
+- 视频字幕的模糊性：“如何描述视频从而让模型的学习更容易”这一问题至今悬而未决。为了完整描述视频，仅一个简短的文本提示肯定是不够的。一系列的提示或一个随时间推移的故事才能用于生成视频。
+
+在下一节中，我们将分别讨论文生视频领域的发展时间线以及为应对这些挑战而提出的各种方法。概括来讲，文生视频的工作主要可以分为以下 3 类：
+1. 提出新的、更高质量的数据集，使得训练更容易。
+2. 在没有`文本-视频对`的情况下训练模型的方法。
+3. 计算效率更高的生成更长和更高分辨率视频的方法。
+
+## 如何实现文生视频？
+
+让我们来看看文生视频的工作原理以及该领域的最新进展。我们将沿着与文生图类似的研究路径，探索文生视频模型的流变，并探讨迄今为止我们是如何解决文生视频领域的具体挑战的。
+
+与文生图任务一样，文生视频也是个年轻的方向，最早只能追溯到几年前。早期研究主要使用基于 GAN 和 VAE 的方法在给定文本描述的情况下自回归地生成视频帧（参见 [Text2Filter](https://huggingface.co/papers/1710.00421) 及 [TGANs-C](https://huggingface.co/papers/1804.08264)）。虽然这些工作为文生视频这一新计算机视觉任务奠定了基础，但它们的应用范围有限，仅限于低分辨率、短距以及视频中目标的运动比较单一、孤立的情况。
+
+<p align="center">
+    <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/140_text-to-video/TGANs-C.png" alt="tgans-c"><br>
+    <em>最初的文生视频模型在分辨率、上下文和长度方面极为有限，图像取自 <a href=https://arxiv.org/abs/1804.08264>TGANs-C</a>。</em>
+</p>
+
+受文本 (GPT-3) 和图像 (DALL-E) 中大规模预训练 transformer 模型的成功启发，文生视频研究的第二波浪潮采用了 transformer 架构。[Phenaki](https://huggingface.co/papers/2210.02399)、[Make-A-Vide](https://huggingface.co/papers/2209.14792)、[NUWA](https://huggingface.co/papers/2111.12417)、[VideoGPT](https://huggingface.co/papers/2104.10157) 和 [CogVideo](https://huggingface.co/papers/2205.15868) 都提出了基于 transformer 的框架，而 [TATS](https://huggingface.co/papers/2204.03638) 提出了一种混合方法，从而将用于生成图像的 VQGAN 和用于顺序地生成帧的时间敏感 transformer 模块结合起来。在第二波浪潮的诸多框架中，Phenaki 尤其有意思，因为它能够根据一系列提示（即一个故事情节）生成任意长视频。同样，[NUWA-Infinity](https://huggingface.co/papers/2207.09814) 提出了一种双重自回归（autoregressive over autoregressive）生成机制，可以基于文本输入合成无限长度的图像和视频，从而使得生成高清的长视频成为可能。但是，Phenaki 或 NUWA 模型均无法从公开渠道获取。
+
+<p align="center">
+    <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/140_text-to-video/phenaki.png" alt="phenaki"><br>
+    <em>Phenaki 的模型架构基于 transformer，图片来自<a href=https://arxiv.org/abs/2210.02399>此处</a>。</em>
+</p>
+
+第三波也就是当前这一波文生视频模型浪潮主要以基于扩散的架构为特征。扩散模型在生成多样化、超现实和上下文丰富的图像方面取得了显著成功，这引起了人们对将扩散模型推广到其他领域（如音频、3D ，最近又拓展到了视频）的兴趣。这一波模型是由 [Video Diffusion Models](https://huggingface.co/papers/2204.03458)（VDM）开创的，它首次将扩散模型推广至视频领域。然后是 [MagicVideo](https://huggingface.co/papers/2211.11018) 提出了一个在低维隐空间中生成视频剪辑的框架，据其报告，新框架与 VDM 相比在效率上有巨大的提升。另一个值得一提的是 [Tune-a-Video](https://huggingface.co/papers/2212.11565)，它使用`单文本-视频对`微调预训练的文生图模型，并允许在保留运动的同时改变视频内容。随后涌现出了越来越多的文生视频扩散模型，包括 [Video LDM](https://huggingface.co/papers/2304.08818)、[Text2Video-Zero](https://huggingface.co/papers/2303.13439 )、[Runway Gen1、Runway Gen2](https://huggingface.co/papers/2302.03011) 以及 [NUWA-XL](https://huggingface.co/papers/2303.12346)。
+
+Text2Video-Zero 是一个文本引导的视频生成和处理框架，其工作方式类似于 ControlNet。它可以基于输入的`文本数据`或`文本 + 姿势混合数据`或`文本 + 边缘混合数据`直接生成（或编辑）视频。顾名思义，Text2Video-Zero 是一种零样本模型，它将可训练的运动动力学模块与预训练的文生图稳定扩散模型相结合，而无需使用任何`文本-视频对`数据。与 Text2Video-Zero 类似，Runway Gen-1 和 Runway Gen-2 模型可以合成由文本或图像描述的内容引导的视频。这些工作大多数都是在短视频片段上训练的，并且依靠带有滑动窗口的自回归机制来生成更长的视频，这不可避免地导致了上下文差异（context gap）。 NUWA-XL 解决了这个问题，并提出了一种“双重扩散（diffusion over diffusion）”方法，并在 3376 帧视频数据上训练模型。最后，还有一些尚未在同行评审的会议或期刊上发表的开源文本到视频模型和框架，例如阿里巴巴达摩院视觉智能实验室的 ModelScope 和 Tencel 的 VideoCrafter。
+
+## 数据集
+与其他视觉语言模型一样，文生视频模型通常在大型`文本-视频对`数据集上进行训练。这些数据集中的视频通常被分成短的、固定长度的块，并且通常仅限于少数几个目标的孤立动作。出现这种情况的一部分原因是计算限制，另一部分原因是以有意义的方式描述视频内容这件事本身就很难。而我们看到多模态视频文本数据集和文生视频模型的发展往往是交织在一起的，因此有不少工作侧重于开发更易于训练的更好、更通用的数据集。同时也有一些工作另辟蹊径，对替代解决方案进行了探索，例如[Phenaki](https://phenaki.video/?mc_cid=9fee7eeb9d#) 将`文本-图像对`与`文本-视频对`相结合用于文生视频任务；Make-a-Video 则更进一步，提议仅使用`文本-图像对`来学习世界表象信息，并使用单模态视频数据以无监督的方式学习时空依赖性。
+
+这些大型数据集面临与文本图像数据集类似的问题。最常用的文本-视频数据集 [WebVid](https://m-bain.github.io/webvid-dataset/) 由 1070 万个`文本-视频对`（视频时长 5.2 万小时）组成，并包含一定量的噪声样本，这些样本中的视频文本描述与视频内容是非相干的。其他数据集试图通过聚焦特定任务或领域来解决这个问题。例如，[Howto100M](https://www.di.ens.fr/willow/research/howto100m/) 数据集包含 13600 万个视频剪辑，其中文本部分描述了如何一步一步地执行复杂的任务，例如烹饪、手工制作、园艺、和健身。而 [QuerYD](https://www.robots.ox.ac.uk/~vgg/data/queryd/) 数据集则聚焦于事件定位任务，视频的字幕详细描述了目标和动作的相对位置。 [CelebV-Text](https://celebv-text.github.io/) 是一个包含超过 7 万个视频的大规模人脸文本-视频数据集，用于生成具有逼真的人脸、情绪和手势的视频。
+
+## Hugging Face 上的文生视频
+
+使用 Hugging Face Diffusers，你可以轻松下载、运行和微调各种预训练的文生视频模型，包括 Text2Video-Zero 和[阿里巴巴达摩院](https://huggingface.co/damo-vilab)的 ModelScope。我们目前正在努力将更多优秀的工作集成到 Diffusers 和 🤗 Transformers 中。
+
+### Hugging Face 应用演示
+
+在 Hugging Face，我们的目标是使Hugging Face 库更易于使用并包含最先进的研究。你可以前往 Hub 查看和体验由 🤗 团队、无数社区贡献者和研究者贡献的 Spaces 演示。目前，上面有 [VideoGPT](https://huggingface.co/spaces/akhaliq/VideoGPT)、[CogVideo](https://huggingface.co/spaces/THUDM/CogVideo)、[ModelScope 文生视频](https://huggingface.co/spaces/damo-vilab/modelscope-text-to-video-synthesis) 以及 [Text2Video-Zero](https://huggingface.co/spaces/PAIR/Text2Video-Zero) 的应用演示，后面还会越来越多，敬请期待。要了解这些模型能用来做什么，我们可以看一下 Text2Video-Zero 的应用演示。该演示不仅展示了文生视频应用，而且还包含多种其他生成模式，如文本引导的视频编辑，以及基于姿势、深度、边缘输入结合文本提示进行联合条件下的视频生成。
+
+<script type="module" src="https://gradio.s3-us-west-2.amazonaws.com/3.1.7/gradio.js"></script>
+
+<gradio-app space="PAIR/Text2Video-Zero"></gradio-app>
+
+除了使用应用演示来尝试预训练文生视频模型外，你还可以使用 [Tune-a-Video 训练演示](https://huggingface.co/spaces/Tune-A-Video-library/Tune-A-Video-Training-UI)使用你自己的`文本-视频对`微调现有的文生图模型。仅需上传视频并输入描述该视频的文本提示即就可以了。你可以将训得的模型上传到公开的 Tune-a-Video 社区的 Hub 或你私人用户名下的 Hub。训练完成后，只需转到演示的 *Run* 选项卡即可根据任何文本提示生成视频。
+
+<gradio-app space="Tune-A-Video-library/Tune-A-Video-Training-UI"></gradio-app>
+
+
+🤗 Hub 上的所有 Space 其实都是 Git 存储库，你可以在本地或部署环境中克隆和运行它们。下面克隆一下 ModelScope 演示，安装环境，并在本地运行它。
+
+```
+git clone https://huggingface.co/spaces/damo-vilab/modelscope-text-to-video-synthesis
+cd modelscope-text-to-video-synthesis
+pip install -r requirements.txt
+python app.py
+```
+
+这就好了！ Modelscope 演示现在已经在你的本地计算机上运行起来了。请注意，Diffusers 支持 ModelScope 文生视频模型，你只需几行代码即可直接加载并使用该模型生成新视频。
+
+
+```
+import torch
+from diffusers import DiffusionPipeline, DPMSolverMultistepScheduler
+from diffusers.utils import export_to_video
+
+pipe = DiffusionPipeline.from_pretrained("damo-vilab/text-to-video-ms-1.7b", torch_dtype=torch.float16, variant="fp16")
+pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)
+pipe.enable_model_cpu_offload()
+
+prompt = "Spiderman is surfing"
+video_frames = pipe(prompt, num_inference_steps=25).frames
+video_path = export_to_video(video_frames)
+```
+
+### 其他的社区开源文生视频项目
+
+最后，还有各种不在 Hub 上的开源项目和模型。一些值得关注的有 Phil Wang（即 lucidrains）的 [Imagen](https://github.com/lucidrains/imagen-pytorch) 非官方实现、[Phenaki](https://github.com/lucidrains/phenaki-pytorch)、[NUWA](https://github.com/lucidrains/nuwa-pytorch), [Make-a-Video](https://github.com/lucidrains/make-a-video-pytorch) 以及 [Video Diffusion 模型](https://github.com/lucidrains/video-diffusion-pytorch)。还有一个有意思的项目 [ExponentialML](https://github.com/ExponentialML/Text-To-Video-Finetuning)，它是基于 🤗 diffusers 的，用于微调 ModelScope 文生视频模型。
+
+## 总结
+
+文生视频的研究正在呈指数级发展，但现有工作在上下文一致性上仍有限制，同时还面临其他诸多挑战。在这篇博文中，我们介绍了文生视频模型的限制、独特挑战和当前状态。我们还看到了最初为其他任务设计的架构范例如何赋能文生视频任务的巨大飞跃，以及这对未来研究意味着什么。虽然进展令人印象深刻，但与文生图模型相比，文生视频模型还有很长的路要走。最后，我们还展示了如何通过 Hub 上的应用演示来使用这些模型，以及如何将这些模型作为 🤗 Diffusers 流水线的一部分来完成各种任务。
+
+本文就到此为止了！我们将继续整合最具影响力的计算机视觉和多模态模型，并希望收到你的反馈。要了解计算机视觉和多模态研究的最新消息，你可以在 Twitter 上关注我们：**[@adirik](https://twitter.com/alaradirik)**、**[@a_e_roberts ](https://twitter.com/a_e_roberts)**、[@osanviero](https://twitter.com/NielsRogge)、[@risingsayak](https://twitter.com/risingsayak) 以及 **[ @huggingface](https://twitter.com/huggingface)**。
+
+> 英文原文: <url> https://huggingface.co/blog/text-to-video </url>
+> 原文作者：Alara Dirik
+> 译者: Matrix Yao (姚伟峰)，英特尔深度学习工程师，工作方向为 transformer-family 模型在各模态数据上的应用及大规模模型的训练推理。