Merge pull request #506 from iCell/shawn/review-06

docs(zh-cn): Reviewed 06_transformer-models-decoders.srt
huggingface · Feb 27, 2023 · df3e0cf · df3e0cf
2 parents dfcf449 + aaee39f
commit df3e0cf
Showing 1 changed file with 58 additions and 58 deletions.
diff --git a/subtitles/zh-CN/06_transformer-models-decoders.srt b/subtitles/zh-CN/06_transformer-models-decoders.srt
@@ -5,13 +5,13 @@
 
 2
 00:00:07,140 --> 00:00:07,973
-一个例子
+一种流行的仅包含解码器架构
 An example
 
 3
 00:00:07,973 --> 00:00:11,338
-一种流行的解码器唯一架构是 GPT 两种。
-of a popular decoder only architecture is GPT two.
+的例子是 GPT-2。
+of a popular decoder only architecture is GPT-2.
 
 4
 00:00:11,338 --> 00:00:14,160
@@ -20,7 +20,7 @@ In order to understand how decoders work
 
 5
 00:00:14,160 --> 00:00:17,430
-我们建议你观看有关编码器的视频。
+我们建议您观看有关编码器的视频。
 we recommend taking a look at the video regarding encoders.
 
 6
@@ -35,7 +35,7 @@ One can use a decoder
 
 8
 00:00:21,210 --> 00:00:23,760
-对于大多数与编码器相同的任务
+执行与编码器相同的大多数任务
 for most of the same tasks as an encoder
 
 9
@@ -55,12 +55,12 @@ with the encoder to try
 
 12
 00:00:30,300 --> 00:00:32,670
-并了解架构差异
+并了解在编码器和解码器之间
 and understand the architectural differences
 
 13
 00:00:32,670 --> 00:00:34,803
-在编码器和解码器之间。
+的架构差异。
 between an encoder and decoder.
 
 14
@@ -70,12 +70,12 @@ We'll use a small example using three words.
 
 15
 00:00:38,910 --> 00:00:41,050
-我们通过他们的解码器传递它们。
+我们通过解码器传递它们。
 We pass them through their decoder.
 
 16
 00:00:41,050 --> 00:00:44,793
-我们检索每个单词的数字表示。
+我们检索每个单词的数值表示。
 We retrieve a numerical representation for each word.
 
 17
@@ -85,17 +85,17 @@ Here for example, the decoder converts the three words.
 
 18
 00:00:49,350 --> 00:00:53,545
-欢迎来到纽约，欢迎来到这三个数字序列。
+Welcome to NYC，这三个数字序列。
 Welcome to NYC, and these three sequences of numbers.
 
 19
 00:00:53,545 --> 00:00:56,040
-解码器只输出一个序列
+解码器针对每个输入词汇
 The decoder outputs exactly one sequence
 
 20
 00:00:56,040 --> 00:00:58,740
-每个输入词的数字。
+只输出一个数列。
 of numbers per input word.
 
 21
@@ -105,7 +105,7 @@ This numerical representation can also
 
 22
 00:01:00,630 --> 00:01:03,783
-称为特征向量或特征传感器。
+称为特征向量（feature vector）或特征传感器（feature sensor）。
 be called a feature vector or a feature sensor.
 
 23
@@ -115,32 +115,32 @@ Let's dive in this representation.
 
 24
 00:01:07,200 --> 00:01:08,490
-它包含一个向量
+它包含了每个通过解码器
 It contains one vector
 
 25
 00:01:08,490 --> 00:01:11,340
-每个通过解码器的单词。
+传递的单词的一个向量。
 per word that was passed through the decoder.
 
 26
 00:01:11,340 --> 00:01:14,250
-这些向量中的每一个都是一个数字表示
+这些向量中的每一个单词
 Each of these vectors is a numerical representation
 
 27
 00:01:14,250 --> 00:01:15,573
-有问题的词。
+都是一个数值表示。
 of the word in question.
 
 28
 00:01:16,920 --> 00:01:18,562
-该向量的维度被定义
+这个向量的维度
 The dimension of that vector is defined
 
 29
 00:01:18,562 --> 00:01:20,703
-通过模型的架构。
+由模型的架构所决定。
 by the architecture of the model.
 
 30
@@ -150,12 +150,12 @@ Where the decoder differs from the encoder is principally
 
 31
 00:01:26,040 --> 00:01:28,200
-具有自我注意机制。
+具有自注意力机制。
 with its self attention mechanism.
 
 32
 00:01:28,200 --> 00:01:30,843
-它使用所谓的掩蔽自我关注。
+它使用所谓的掩蔽自注意力。
 It's using what is called masked self attention.
 
 33
@@ -165,27 +165,27 @@ Here, for example, if we focus on the word "to"
 
 34
 00:01:34,650 --> 00:01:37,620
-我们会看到 vector 是绝对未修改的
+我们会发现它的向量
 we'll see that is vector is absolutely unmodified
 
 35
 00:01:37,620 --> 00:01:39,690
-用纽约的话来说。
+完全未被 NYC 单词修改。
 by the NYC word.
 
 36
 00:01:39,690 --> 00:01:41,731
-那是因为右边所有的话，也都知道
+那是因为右边所有的话，即
 That's because all the words on the right, also known
 
 37
 00:01:41,731 --> 00:01:45,276
-因为这个词的正确上下文被掩盖了
+单词的右侧上下文都被屏蔽了
 as the right context of the word is masked rather
 
 38
 00:01:45,276 --> 00:01:49,230
-而不是受益于左右所有的话。
+而没有从左侧和右侧的所有单词中受益。
 than benefiting from all the words on the left and right.
 
 39
@@ -205,32 +205,32 @@ which can be the left context or the right context.
 
 42
 00:01:59,539 --> 00:02:03,356
-Masked self attention 机制不同
+掩蔽自注意力机制不同于
 The masked self attention mechanism differs
 
 43
 00:02:03,356 --> 00:02:04,320
-来自 self attention 机制
+自注意力机制
 from the self attention mechanism
 
 44
 00:02:04,320 --> 00:02:07,110
-通过使用额外的掩码来隐藏上下文
+通过使用额外的掩码在单词的两边
 by using an additional mask to hide the context
 
 45
 00:02:07,110 --> 00:02:09,390
-在单词的两边
+来隐藏上下文
 on either side of the word
 
 46
 00:02:09,390 --> 00:02:12,810
-单词数值表示不会受到影响
+通过隐藏上下文中的单词
 the words numerical representation will not be affected
 
 47
 00:02:12,810 --> 00:02:14,853
-通过隐藏上下文中的单词。
+单词数值表示不会受到影响。
 by the words in the hidden context.
 
 48
@@ -245,7 +245,7 @@ Decoders like encoders can be used as standalone models
 
 50
 00:02:22,380 --> 00:02:25,020
-因为它们生成数字表示。
+因为它们生成数值表示。
 as they generate a numerical representation.
 
 51
@@ -265,42 +265,42 @@ A word can only have access to its left context
 
 54
 00:02:34,530 --> 00:02:36,690
-只能访问他们的左上下文。
+因为它只有左侧的上下文信息。
 having only access to their left context.
 
 55
 00:02:36,690 --> 00:02:39,120
-他们天生擅长文本生成
+它们天生擅长文本生成
 They're inherently good at text generation
 
 56
 00:02:39,120 --> 00:02:41,010
-生成单词的能力
+即在已知的词序列基础上生成一个单词
 the ability to generate a word
 
 57
 00:02:41,010 --> 00:02:45,000
-或给定已知单词序列的单词序列。
+或单词序列的能力。
 or a sequence of words given a known sequence of words.
 
 58
 00:02:45,000 --> 00:02:45,833
-这是众所周知的
+这被称为
 This is known
 
 59
 00:02:45,833 --> 00:02:49,083
-作为因果语言建模或自然语言生成。
+因果语言建模或自然语言生成。
 as causal language modeling or natural language generation.
 
 60
 00:02:50,430 --> 00:02:53,520
-这是因果语言建模如何工作的示例。
+下面是一个展示因果语言模型的工作原理的示例。
 Here's an example of how causal language modeling works.
 
 61
 00:02:53,520 --> 00:02:56,410
-我们从一个词开始，这是我的
+我们从一个词 my 开始，
 We start with an initial word, which is my
 
 62
@@ -320,52 +320,52 @@ and this vector contains information about the sequence
 
 65
 00:03:07,230 --> 00:03:08,733
-这是一个词。
+这里的序列是一个单词。
 which is here a single word.
 
 66
 00:03:09,780 --> 00:03:11,430
-我们应用一个小的转换
+我们对该向量
 We apply a small transformation
 
 67
 00:03:11,430 --> 00:03:13,110
-到那个向量，以便它映射
+应用一个小的转换
 to that vector so that it maps
 
 68
 00:03:13,110 --> 00:03:16,500
-到模型已知的所有单词，这是一个映射
+使其映射到模型已知的所有单词
 to all the words known by the model, which is a mapping
 
 69
 00:03:16,500 --> 00:03:19,890
-我们稍后会看到称为语言建模头。
+这个映射我们稍后会看到，称为语言模型头部信息
 that we'll see later called a language modeling head.
 
 70
 00:03:19,890 --> 00:03:21,930
-我们确定该模型相信
+我们发现模型认为
 We identify that the model believes
 
 71
 00:03:21,930 --> 00:03:25,053
-最有可能的后续单词是 name。
+接下来最有可能的单词是 “name”。
 that the most probable following word is name.
 
 72
 00:03:26,250 --> 00:03:28,710
-然后我们取那个新词并添加它
+然后我们把这个新单词加到原始的序列 my 后面
 We then take that new word and add it
 
 73
 00:03:28,710 --> 00:03:33,480
-到我的初始序列，我们现在以我的名字命名。
+我们得到了 my name。
 to the initial sequence from my, we are now at my name.
 
 74
 00:03:33,480 --> 00:03:36,870
-这就是自回归方面的用武之地。
+这就是自回归（auto-regressive）的作用所在。
 This is where the auto regressive aspect comes in.
 
 75
@@ -375,7 +375,7 @@ Auto regressive models.
 
 76
 00:03:38,490 --> 00:03:42,513
-我们使用他们过去的输出作为输入和以下步骤。
+我们使用它们过去的输出作为输入和接下来的步骤。
 We use their past outputs as inputs and the following steps.
 
 77
@@ -395,7 +395,7 @@ and retrieve the most probable following word.
 
 80
 00:03:52,978 --> 00:03:57,978
-本例中就是 “是” 这个词，我们重复操作
+本例中就是 “is” 这个词，我们重复操作
 In this case, it is the word "is", we repeat the operation
 
 81
@@ -410,13 +410,13 @@ We've now generated a full sentence.
 
 83
 00:04:04,590 --> 00:04:07,890
-我们决定就此打住，但我们可以继续一段时间。
+我们决定就此打住，但我们也可以继续一段时间。
 We decide to stop there, but we could continue for a while.
 
 84
 00:04:07,890 --> 00:04:12,890
-例如，GPT 2 的最大上下文大小为 1,024。
-GPT two, for example, has a maximum context size of 1,024.
+例如，GPT-2 的最大上下文大小为 1,024。
+GPT-2, for example, has a maximum context size of 1,024.
 
 85
 00:04:13,170 --> 00:04:16,830
@@ -425,11 +425,11 @@ We could eventually generate up to a 1,024 words
 
 86
 00:04:16,830 --> 00:04:19,050
-并且解码器仍然会有一些记忆
+并且解码器仍然会对这个序列的前几个单词
 and the decoder would still have some memory
 
 87
 00:04:19,050 --> 00:04:21,003
-这个序列中的第一个单词。
+有一些记忆。
 of the first words in this sequence.