huggingface
diff --git a/‎chapters/en/chapter6/3.mdx‎
Lines changed: 1 addition & 1 deletion b/‎chapters/en/chapter6/3.mdx‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎chapters/zh-CN/_toctree.yml‎
Lines changed: 1 addition & 1 deletion b/‎chapters/zh-CN/_toctree.yml‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎chapters/zh-CN/chapter4/2.mdx‎
Lines changed: 9 additions & 10 deletions b/‎chapters/zh-CN/chapter4/2.mdx‎
Lines changed: 9 additions & 10 deletions
diff --git a/‎chapters/zh-CN/chapter7/1.mdx‎
Lines changed: 8 additions & 8 deletions b/‎chapters/zh-CN/chapter7/1.mdx‎
Lines changed: 8 additions & 8 deletions
diff --git a/‎subtitles/zh-CN/08_what-happens-inside-the-pipeline-function-(pytorch).srt‎
Lines changed: 11 additions & 9 deletions b/‎subtitles/zh-CN/08_what-happens-inside-the-pipeline-function-(pytorch).srt‎
Lines changed: 11 additions & 9 deletions
diff --git a/‎subtitles/zh-CN/09_what-happens-inside-the-pipeline-function-(tensorflow).srt‎
Lines changed: 20 additions & 18 deletions b/‎subtitles/zh-CN/09_what-happens-inside-the-pipeline-function-(tensorflow).srt‎
Lines changed: 20 additions & 18 deletions
@@ -109,7 +109,7 @@ We can see that the tokenizer's special tokens `[CLS]` and `[SEP]` are mapped to
 
 <Tip>
 
-The notion of what a word is is complicated. For instance, does "I'll" (a contraction of "I will") count as one or two words? It actually depends on the tokenizer and the pre-tokenization operation it applies. Some tokenizers just split on spaces, so they will consider this as one word. Others use punctuation on top of spaces, so will consider it two words.
+The notion of what a word is complicated. For instance, does "I'll" (a contraction of "I will") count as one or two words? It actually depends on the tokenizer and the pre-tokenization operation it applies. Some tokenizers just split on spaces, so they will consider this as one word. Others use punctuation on top of spaces, so will consider it two words.
 
 ✏️ **Try it out!** Create a tokenizer from the `bert-base-cased` and `roberta-base` checkpoints and tokenize "81s" with them. What do you observe? What are the word IDs?
 
 
@@ -69,7 +69,7 @@
   - local: chapter4/1
     title: The Hugging Face Hub
   - local: chapter4/2
-    title: 使用预训练的模型
+    title: 使用预训练模型
   - local: chapter4/3
     title: 分享预训练的模型
   - local: chapter4/4
 
@@ -1,6 +1,6 @@
 <FrameworkSwitchCourse {fw} />
 
-# 使用预训练的模型 [[使用预训练的模型]]
+# 使用预训练模型 [[使用预训练模型]]
 
 {#if fw === 'pt'}
 
@@ -22,15 +22,15 @@
 
 {/if}
 
-模型中心使选择合适的模型变得简单，因此只需几行代码即可在任何下游库中使用它。让我们来看看如何实际使用这些模型之一，以及如何回馈社区。
+模型中心使选择合适的模型变得简单，因此只需几行代码即可在任何下游库中使用它。让我们来看看如何使用这些模型，以及如何将模型贡献到社区。
 
-假设我们正在寻找一种可以执行**mask**填充的French-based模型。
+假设我们正在寻找一种可以执行 mask 填充的 French-based 模型。
 
 <div class="flex justify-center">
 <img src="https://huggingface.co/datasets/huggingface-course/documentation-images/resolve/main/en/chapter4/camembert.gif" alt="Selecting the Camembert model." width="80%"/>
 </div>
 
-我们选择 **camembert-base** 检查点来尝试一下。我们需要做的仅仅是输入 `camembert-base`标识符！正如您在前几章中看到的，我们可以使用 **pipeline()** 功能：
+我们选择 `camembert-base` 检查点来尝试一下。我们需要做的仅仅是输入 `camembert-base` 标识符！正如你在前几章中看到的，我们可以使用 `pipeline()` 功能：
 
 ```py
 from transformers import pipeline
@@ -49,13 +49,13 @@ results = camembert_fill_mask("Le camembert est <mask> :)")
 ]
 ```
 
-如您所见，在管道中加载模型非常简单。您唯一需要注意的是所选检查点是否适合它将用于的任务。例如，这里我们正在加载 **camembert-base** 检查点在 **fill-mask** 管道，这完全没问题。但是如果我们要在 **text-classification** 管道，结果没有任何意义，因为 **camembert-base** 不适合这个任务！我们建议使用 Hugging Face Hub 界面中的任务选择器来选择合适的检查点：
+如你所见，在管道中加载模型非常简单。你唯一需要注意的是所选检查点是否适合它将用于的任务。例如，这里我们正在将 `camembert-base` 检查点加载在 `fill-mask` 管道，这完全没问题。但是如果我们在 `text-classification` 管道加载检查点，结果没有任何意义，因为 `camembert-base` 不适合这个任务！我们建议使用 Hugging Face Hub 界面中的任务选择器来选择合适的检查点：
 
 <div class="flex justify-center">
 <img src="https://huggingface.co/datasets/huggingface-course/documentation-images/resolve/main/en/chapter4/tasks.png" alt="The task selector on the web interface." width="80%"/>
 </div>
 
-您还可以直接使用模型架构实例化检查点：
+你还可以直接使用模型架构实例化检查点：
 
 {#if fw === 'pt'}
 ```py
@@ -65,7 +65,7 @@ tokenizer = CamembertTokenizer.from_pretrained("camembert-base")
 model = CamembertForMaskedLM.from_pretrained("camembert-base")
 ```
 
-然而，我们建议使用[Auto* 类](https://huggingface.co/transformers/model_doc/auto.html?highlight=auto#auto-classes)，因为Auto* 类设计与架构无关。前面的代码示例将只能在 CamemBERT 架构中加载可用的检查点，但使用 **Auto*** 类使切换检查点变得简单：
+然而，我们建议使用 [`Auto*` 类](https://huggingface.co/transformers/model_doc/auto.html?highlight=auto#auto-classes)，因为 `Auto*` 类设计与架构无关。前面的代码示例将只能在 CamemBERT 架构中加载可用的检查点，但使用 `Auto*` 类使切换不同的检查点变得简单：
 
 ```py
 from transformers import AutoTokenizer, AutoModelForMaskedLM
@@ -81,8 +81,7 @@ tokenizer = CamembertTokenizer.from_pretrained("camembert-base")
 model = TFCamembertForMaskedLM.from_pretrained("camembert-base")
 ```
 
-However, we recommend using the [`TFAuto*` classes](https://huggingface.co/transformers/model_doc/auto.html?highlight=auto#auto-classes) instead, as these are by design architecture-agnostic. While the previous code sample limits users to checkpoints loadable in the CamemBERT architecture, using the `TFAuto*` classes makes switching checkpoints simple:
-然而，我们建议使用[`TFAuto*` 类](https://huggingface.co/transformers/model_doc/auto.html?highlight=auto#auto-classes)，因为`TFAuto*`类设计与架构无关。前面的代码示例将只能在 CamemBERT 架构中加载可用的检查点，但使用 `TFAuto*`  类使切换检查点变得简单：
+然而，我们建议使用 [`TFAuto*` 类](https://huggingface.co/transformers/model_doc/auto.html?highlight=auto#auto-classes)，因为 `TFAuto*` 类设计与架构无关。前面的代码示例将只能在 CamemBERT 架构中加载可用的检查点，但使用 `TFAuto*` 类使切换不同的检查点变得简单：
 
 ```py
 from transformers import AutoTokenizer, TFAutoModelForMaskedLM
@@ -93,5 +92,5 @@ model = TFAutoModelForMaskedLM.from_pretrained("camembert-base")
 {/if}
 
 <Tip>
-使用预训练模型时，一定要检查它是如何训练的，在哪些数据集上，它的限制和它的偏差。所有这些信息都应在其模型卡片上注明。
+使用预训练模型时，一定要检查它是如何训练的、在哪些数据集上训练的、它的局限性和偏差。所有这些信息都应在其模型卡片上注明。
 </Tip>
@@ -2,24 +2,24 @@
 
 # 章节简介 [[章节简介]]
 
-在[第三章](/course/chapter3)，您了解了如何微调文本分类的模型。在本章中，我们将处理以下常见NLP任务：
+在[第三章](/course/chapter3)，您了解了如何微调文本分类的模型。在本章中，我们将处理以下常见的 NLP 任务：
 
-- 标记(token)分类
-- 遮罩语言建模（如BERT）
-- 提取文本摘要
+- 词元（token）分类
+- 掩码语言建模（如 BERT）
+- 文本摘要
 - 翻译
-- 因果语言建模预训练（如GPT-2）
+- 因果语言建模预训练（如 GPT-2）
 - 问答
 
 {#if fw === 'pt'}
 
-为此，您需要利用[第三章](/course/chapter3)中学到的`Trainer` API 和🤗Accelerate 库、[第五章](/course/chapter5)中的 🤗 Datasets 库以及[第六章](/course/chapter6)中的 🤗 Tokenizers 库的所有知识。我们还会将结果上传到模型中心，就像我们在[第四章](/course/chapter4)中所做的那样，所以这确实是将之前所有内容汇集在一起的章节！
+为此，您需要利用[第三章](/course/chapter3)中学到的 `Trainer` API 和 🤗 Accelerate 库、[第五章](/course/chapter5)中的 🤗 Datasets 库以及[第六章](/course/chapter6)中的 🤗 Tokenizers 库的所有知识。我们同样会将结果上传到模型中心，就像我们在[第四章](/course/chapter4)中所做的那样，所以这确实是融会贯通的一章！
 
-每个部分都可以独立阅读，并将向您展示如何使用API或按照您自己的训练循环训练模型，使用🤗 Accelerate 加速。你可以随意跳过其中一部分，把注意力集中在你最感兴趣的那一部分：API可以优化或训练您的模型而无需担心幕后发生了什么，而训练循环使用可以让您更轻松地自定义所需的任何结构。
+每个部分都可以独立阅读，并将向您展示如何使用 `Trainer` API 或按照您自己的训练循环训练模型，并采用 🤗 Accelerate 加速。你可以随意跳过任何一部分，专注于您最感兴趣的部分：`Trainer` API 非常适用于微调（fine-tuning）或训练您的模型，且无需担心幕后发生的事情；而采用 `Accelerate` 的训练循环可以让您更轻松地自定义所需的任何结构。
 
 {:else}
 
-为此，您需要利用[第三章](/course/chapter3)中学到的有关Keras API、[第五章](/course/chapter5)中的 🤗 Datasets 库以及[第六章](/course/chapter6)中的 🤗 Tokenizers 库的所有知识。我们还会将结果上传到模型中心，就像我们在[第四章](/course/chapter4)中所做的那样，所以这确实是将之前所有内容汇集在一起的章节！
+为此，您需要利用[第三章](/course/chapter3)中学到的有关 Keras API、[第五章](/course/chapter5)中的 🤗 Datasets 库以及[第六章](/course/chapter6)中的 🤗 Tokenizers 库的所有知识。我们同样会将结果上传到模型中心，就像我们在[第四章](/course/chapter4)中所做的那样，所以这确实是融会贯通的一章！
 
 每个部分都可以独立阅读。
 
 
@@ -6,6 +6,7 @@
 2
 00:00:05,340 --> 00:00:07,563
 - pipeline 函数内部发生了什么？
+*[译者注: pipeline 作为 流水线 的意思]
 - What happens inside the pipeline function?
 
 3
@@ -25,7 +26,7 @@ of the Transformers library.
 
 6
 00:00:15,090 --> 00:00:16,860
-详细来讲，我们将举例
+详细来讲，我们将看
 More specifically, we will look
 
 7
@@ -100,17 +101,18 @@ and how to replicate them using the Transformers library,
 
 21
 00:00:53,640 --> 00:00:56,043
-从第一阶段开始，token 化。
+从第一阶段开始，分词化。
 beginning with the first stage, tokenization.
 
 22
 00:00:57,915 --> 00:01:00,360
-token 化过程有几个步骤。
+分词化过程有几个步骤。
 The tokenization process has several steps.
 
 23
 00:01:00,360 --> 00:01:04,950
-首先，文本被分成称为 token 的小块。
+首先，文本被分成小块, 称之为 token。 
+*[译者注: 后面 token-* 均翻译成 分词-*]
 First, the text is split into small chunks called tokens.
 
 24
@@ -120,7 +122,7 @@ They can be words, parts of words or punctuation symbols.
 
 25
 00:01:08,550 --> 00:01:11,580
-然后 tokenizer 将有一些特殊的 token ，
+然后分词器将有一些特殊的 token ，
 Then the tokenizer will had some special tokens,
 
 26
@@ -140,7 +142,7 @@ and a SEP token at the end of the sentence to classify.
 
 29
 00:01:20,580 --> 00:01:24,180
-最后，tokenizer 将每个 token 与其唯一 ID 匹配
+最后，分词器将每个 token 与其唯一 ID 匹配
 Lastly, the tokenizer matches each token to its unique ID
 
 30
@@ -265,7 +267,7 @@ with zero where the padding is applied.
 
 54
 00:02:32,550 --> 00:02:34,260
-第二个键值，注意力 mask ，
+第二个键值，注意力掩码，
 The second key, attention mask,
 
 55
@@ -280,7 +282,7 @@ so the model does not pay attention to it.
 
 57
 00:02:38,940 --> 00:02:42,090
-这就是 token 化步骤中的全部内容。
+这就是分词化步骤中的全部内容。
 This is all what is inside the tokenization step.
 
 58
@@ -490,7 +492,7 @@ correspond to the negative label,
 
 99
 00:04:32,250 --> 00:04:34,140
-秒，索引一，
+然后第二个，索引一，
 and the seconds, index one,
 
 100
 
@@ -75,7 +75,8 @@ Then, those numbers go through the model,
 
 16
 00:00:42,600 --> 00:00:44,550
-输出逻辑。
+输出 logits 。
+*[译者注: logits 作为逻辑值的意思]
 which outputs logits.
 
 17
@@ -95,22 +96,22 @@ Let's look in detail at those three steps,
 
 20
 00:00:52,590 --> 00:00:55,200
-以及如何使用 Transformers 库复制它们，
+以及如何使用 Transformers 库复现它们，
 and how to replicate them using the Transformers library,
 
 21
 00:00:55,200 --> 00:00:57,903
-从第一阶段开始，标记化。
+从第一阶段开始，分词化。
 beginning with the first stage, tokenization.
 
 22
 00:00:59,905 --> 00:01:02,520
-令牌化过程有几个步骤。
+分词化过程有几个步骤。
 The tokenization process has several steps.
 
 23
 00:01:02,520 --> 00:01:06,900
-首先，文本被分成称为标记的小块。
+首先，文本被分成称为 token 的小块。
 First, the text is split into small chunks called token.
 
 24
@@ -120,7 +121,7 @@ They can be words, parts of words or punctuation symbols.
 
 25
 00:01:10,800 --> 00:01:14,310
-然后 tokenizer 将有一些特殊的标记
+然后分词器将有一些特殊的 token 
 Then the tokenizer will had some special tokens
 
 26
@@ -130,12 +131,12 @@ if the model expect them.
 
 27
 00:01:16,440 --> 00:01:20,430
-在这里，所使用的模型在开头需要一个 CLS 令牌
+在这里，所使用的模型在开头需要一个 CLS token 
 Here, the model used expects a CLS token at the beginning
 
 28
 00:01:20,430 --> 00:01:23,910
-以及用于分类的句子末尾的 SEP 标记。
+以及用于分类的句子末尾的 SEP token。
 and a SEP token at the end of the sentence to classify.
 
 29
@@ -170,7 +171,8 @@ which will download and cache the configuration
 
 35
 00:01:41,940 --> 00:01:44,913
-以及与给定检查点相关联的词汇表。
+以及与给定 checkpoint 相关联的词汇表。
+*[译者注: 在深度学习中, checkpoint 作为检查点是用来备份模型的, 后不翻译]
 and the vocabulary associated to a given checkpoint.
 
 36
@@ -180,13 +182,13 @@ Here, the checkpoint used by default
 
 37
 00:01:48,180 --> 00:01:50,310
-用于情绪分析管道
+用于情绪分析的 pipeline
 for the sentiment analysis pipeline
 
 38
 00:01:50,310 --> 00:01:54,510
-是 distilbert base uncased finetuned sst2 英语，
-is distilbert base uncased finetuned sst2 English,
+是 distilbert-base-uncased-finetuned-sst2-English，
+is distilbert-base-uncased-finetuned-sst2-English,
 
 39
 00:01:54,510 --> 00:01:55,960
@@ -195,7 +197,7 @@ which is a bit of a mouthful.
 
 40
 00:01:56,820 --> 00:01:59,760
-我们实例化一个与该检查点关联的分词器，
+我们实例化一个与该 checkpoint 关联的分词器，
 We instantiate a tokenizer associated with that checkpoint,
 
 41
@@ -270,7 +272,7 @@ with zeros where the padding is applied.
 
 55
 00:02:36,750 --> 00:02:38,550
-第二把钥匙，注意面具，
+第二把钥匙，注意力掩码，
 The second key, attention mask,
 
 56
@@ -320,7 +322,7 @@ However, the AutoModel API will only instantiate
 
 65
 00:03:04,830 --> 00:03:06,540
-模特的身体，
+模型的主体，
 the body of the model,
 
 66
@@ -360,7 +362,7 @@ Here the tensor has two sentences,
 
 73
 00:03:24,210 --> 00:03:26,070
-每十六个令牌，
+每十六个 token，
 each of sixteen token,
 
 74
@@ -435,12 +437,12 @@ This is because each model of the Transformers library
 
 88
 00:04:06,090 --> 00:04:07,830
-返回逻辑。
+返回 logits 。
 returns logits.
 
 89
 00:04:07,830 --> 00:04:09,480
-为了理解这些逻辑，
+为了理解这些 logits ，
 To make sense of those logits,
 
 90
Original file line number	Diff line number	Diff line change
`@@ -6,6 +6,7 @@`
`6`	`6`	`2`
`7`	`7`	`00:00:05,340 --> 00:00:07,563`
`8`	`8`	`- pipeline 函数内部发生了什么？`
	`9`	`+*[译者注: pipeline 作为流水线的意思]`
`9`	`10`	`- What happens inside the pipeline function?`
`10`	`11`
`11`	`12`	`3`
`@@ -25,7 +26,7 @@ of the Transformers library.`
`25`	`26`
`26`	`27`	`6`
`27`	`28`	`00:00:15,090 --> 00:00:16,860`
`28`		`-详细来讲，我们将举例`
	`29`	`+详细来讲，我们将看`
`29`	`30`	`More specifically, we will look`
`30`	`31`
`31`	`32`	`7`
`@@ -100,17 +101,18 @@ and how to replicate them using the Transformers library,`
`100`	`101`
`101`	`102`	`21`
`102`	`103`	`00:00:53,640 --> 00:00:56,043`
`103`		`-从第一阶段开始，token 化。`
	`104`	`+从第一阶段开始，分词化。`
`104`	`105`	`beginning with the first stage, tokenization.`
`105`	`106`
`106`	`107`	`22`
`107`	`108`	`00:00:57,915 --> 00:01:00,360`
`108`		`-token 化过程有几个步骤。`
	`109`	`+分词化过程有几个步骤。`
`109`	`110`	`The tokenization process has several steps.`
`110`	`111`
`111`	`112`	`23`
`112`	`113`	`00:01:00,360 --> 00:01:04,950`
`113`		`-首先，文本被分成称为 token 的小块。`
	`114`	`+首先，文本被分成小块, 称之为 token。`
	`115`	`+[译者注: 后面 token- 均翻译成分词-*]`
`114`	`116`	`First, the text is split into small chunks called tokens.`
`115`	`117`
`116`	`118`	`24`
`@@ -120,7 +122,7 @@ They can be words, parts of words or punctuation symbols.`
`120`	`122`
`121`	`123`	`25`
`122`	`124`	`00:01:08,550 --> 00:01:11,580`
`123`		`-然后 tokenizer 将有一些特殊的 token ，`
	`125`	`+然后分词器将有一些特殊的 token ，`
`124`	`126`	`Then the tokenizer will had some special tokens,`
`125`	`127`
`126`	`128`	`26`
`@@ -140,7 +142,7 @@ and a SEP token at the end of the sentence to classify.`
`140`	`142`
`141`	`143`	`29`
`142`	`144`	`00:01:20,580 --> 00:01:24,180`
`143`		`-最后，tokenizer 将每个 token 与其唯一 ID 匹配`
	`145`	`+最后，分词器将每个 token 与其唯一 ID 匹配`
`144`	`146`	`Lastly, the tokenizer matches each token to its unique ID`
`145`	`147`
`146`	`148`	`30`
`@@ -265,7 +267,7 @@ with zero where the padding is applied.`
`265`	`267`
`266`	`268`	`54`
`267`	`269`	`00:02:32,550 --> 00:02:34,260`
`268`		`-第二个键值，注意力 mask ，`
	`270`	`+第二个键值，注意力掩码，`
`269`	`271`	`The second key, attention mask,`
`270`	`272`
`271`	`273`	`55`
`@@ -280,7 +282,7 @@ so the model does not pay attention to it.`
`280`	`282`
`281`	`283`	`57`
`282`	`284`	`00:02:38,940 --> 00:02:42,090`
`283`		`-这就是 token 化步骤中的全部内容。`
	`285`	`+这就是分词化步骤中的全部内容。`
`284`	`286`	`This is all what is inside the tokenization step.`
`285`	`287`
`286`	`288`	`58`
`@@ -490,7 +492,7 @@ correspond to the negative label,`
`490`	`492`
`491`	`493`	`99`
`492`	`494`	`00:04:32,250 --> 00:04:34,140`
`493`		`-秒，索引一，`
	`495`	`+然后第二个，索引一，`
`494`	`496`	`and the seconds, index one,`
`495`	`497`
`496`	`498`	`100`
Original file line number	Diff line number	Diff line change
`@@ -75,7 +75,8 @@ Then, those numbers go through the model,`
`75`	`75`
`76`	`76`	`16`
`77`	`77`	`00:00:42,600 --> 00:00:44,550`
`78`		`-输出逻辑。`
	`78`	`+输出 logits 。`
	`79`	`+*[译者注: logits 作为逻辑值的意思]`
`79`	`80`	`which outputs logits.`
`80`	`81`
`81`	`82`	`17`
`@@ -95,22 +96,22 @@ Let's look in detail at those three steps,`
`95`	`96`
`96`	`97`	`20`
`97`	`98`	`00:00:52,590 --> 00:00:55,200`
`98`		`-以及如何使用 Transformers 库复制它们，`
	`99`	`+以及如何使用 Transformers 库复现它们，`
`99`	`100`	`and how to replicate them using the Transformers library,`
`100`	`101`
`101`	`102`	`21`
`102`	`103`	`00:00:55,200 --> 00:00:57,903`
`103`		`-从第一阶段开始，标记化。`
	`104`	`+从第一阶段开始，分词化。`
`104`	`105`	`beginning with the first stage, tokenization.`
`105`	`106`
`106`	`107`	`22`
`107`	`108`	`00:00:59,905 --> 00:01:02,520`
`108`		`-令牌化过程有几个步骤。`
	`109`	`+分词化过程有几个步骤。`
`109`	`110`	`The tokenization process has several steps.`
`110`	`111`
`111`	`112`	`23`
`112`	`113`	`00:01:02,520 --> 00:01:06,900`
`113`		`-首先，文本被分成称为标记的小块。`
	`114`	`+首先，文本被分成称为 token 的小块。`
`114`	`115`	`First, the text is split into small chunks called token.`
`115`	`116`
`116`	`117`	`24`
`@@ -120,7 +121,7 @@ They can be words, parts of words or punctuation symbols.`
`120`	`121`
`121`	`122`	`25`
`122`	`123`	`00:01:10,800 --> 00:01:14,310`
`123`		`-然后 tokenizer 将有一些特殊的标记`
	`124`	`+然后分词器将有一些特殊的 token`
`124`	`125`	`Then the tokenizer will had some special tokens`
`125`	`126`
`126`	`127`	`26`
`@@ -130,12 +131,12 @@ if the model expect them.`
`130`	`131`
`131`	`132`	`27`
`132`	`133`	`00:01:16,440 --> 00:01:20,430`
`133`		`-在这里，所使用的模型在开头需要一个 CLS 令牌`
	`134`	`+在这里，所使用的模型在开头需要一个 CLS token`
`134`	`135`	`Here, the model used expects a CLS token at the beginning`
`135`	`136`
`136`	`137`	`28`
`137`	`138`	`00:01:20,430 --> 00:01:23,910`
`138`		`-以及用于分类的句子末尾的 SEP 标记。`
	`139`	`+以及用于分类的句子末尾的 SEP token。`
`139`	`140`	`and a SEP token at the end of the sentence to classify.`
`140`	`141`
`141`	`142`	`29`
`@@ -170,7 +171,8 @@ which will download and cache the configuration`
`170`	`171`
`171`	`172`	`35`
`172`	`173`	`00:01:41,940 --> 00:01:44,913`
`173`		`-以及与给定检查点相关联的词汇表。`
	`174`	`+以及与给定 checkpoint 相关联的词汇表。`
	`175`	`+*[译者注: 在深度学习中, checkpoint 作为检查点是用来备份模型的, 后不翻译]`
`174`	`176`	`and the vocabulary associated to a given checkpoint.`
`175`	`177`
`176`	`178`	`36`
`@@ -180,13 +182,13 @@ Here, the checkpoint used by default`
`180`	`182`
`181`	`183`	`37`
`182`	`184`	`00:01:48,180 --> 00:01:50,310`
`183`		`-用于情绪分析管道`
	`185`	`+用于情绪分析的 pipeline`
`184`	`186`	`for the sentiment analysis pipeline`
`185`	`187`
`186`	`188`	`38`
`187`	`189`	`00:01:50,310 --> 00:01:54,510`
`188`		`-是 distilbert base uncased finetuned sst2 英语，`
`189`		`-is distilbert base uncased finetuned sst2 English,`
	`190`	`+是 distilbert-base-uncased-finetuned-sst2-English，`
	`191`	`+is distilbert-base-uncased-finetuned-sst2-English,`
`190`	`192`
`191`	`193`	`39`
`192`	`194`	`00:01:54,510 --> 00:01:55,960`
`@@ -195,7 +197,7 @@ which is a bit of a mouthful.`
`195`	`197`
`196`	`198`	`40`
`197`	`199`	`00:01:56,820 --> 00:01:59,760`
`198`		`-我们实例化一个与该检查点关联的分词器，`
	`200`	`+我们实例化一个与该 checkpoint 关联的分词器，`
`199`	`201`	`We instantiate a tokenizer associated with that checkpoint,`
`200`	`202`
`201`	`203`	`41`
`@@ -270,7 +272,7 @@ with zeros where the padding is applied.`
`270`	`272`
`271`	`273`	`55`
`272`	`274`	`00:02:36,750 --> 00:02:38,550`
`273`		`-第二把钥匙，注意面具，`
	`275`	`+第二把钥匙，注意力掩码，`
`274`	`276`	`The second key, attention mask,`
`275`	`277`
`276`	`278`	`56`
`@@ -320,7 +322,7 @@ However, the AutoModel API will only instantiate`
`320`	`322`
`321`	`323`	`65`
`322`	`324`	`00:03:04,830 --> 00:03:06,540`
`323`		`-模特的身体，`
	`325`	`+模型的主体，`
`324`	`326`	`the body of the model,`
`325`	`327`
`326`	`328`	`66`
`@@ -360,7 +362,7 @@ Here the tensor has two sentences,`
`360`	`362`
`361`	`363`	`73`
`362`	`364`	`00:03:24,210 --> 00:03:26,070`
`363`		`-每十六个令牌，`
	`365`	`+每十六个 token，`
`364`	`366`	`each of sixteen token,`
`365`	`367`
`366`	`368`	`74`
`@@ -435,12 +437,12 @@ This is because each model of the Transformers library`
`435`	`437`
`436`	`438`	`88`
`437`	`439`	`00:04:06,090 --> 00:04:07,830`
`438`		`-返回逻辑。`
	`440`	`+返回 logits 。`
`439`	`441`	`returns logits.`
`440`	`442`
`441`	`443`	`89`
`442`	`444`	`00:04:07,830 --> 00:04:09,480`
`443`		`-为了理解这些逻辑，`
	`445`	`+为了理解这些 logits ，`
`444`	`446`	`To make sense of those logits,`
`445`	`447`
`446`	`448`	`90`