Skip to content

Commit

Permalink
Merge pull request #506 from iCell/shawn/review-06
Browse files Browse the repository at this point in the history
docs(zh-cn): Reviewed 06_transformer-models-decoders.srt
  • Loading branch information
xianbaoqian authored Feb 27, 2023
2 parents dfcf449 + aaee39f commit df3e0cf
Showing 1 changed file with 58 additions and 58 deletions.
116 changes: 58 additions & 58 deletions subtitles/zh-CN/06_transformer-models-decoders.srt
Original file line number Diff line number Diff line change
Expand Up @@ -5,13 +5,13 @@

2
00:00:07,140 --> 00:00:07,973
一个例子
一种流行的仅包含解码器架构
An example

3
00:00:07,973 --> 00:00:11,338
一种流行的解码器唯一架构是 GPT 两种
of a popular decoder only architecture is GPT two.
的例子是 GPT-2
of a popular decoder only architecture is GPT-2.

4
00:00:11,338 --> 00:00:14,160
Expand All @@ -20,7 +20,7 @@ In order to understand how decoders work

5
00:00:14,160 --> 00:00:17,430
我们建议你观看有关编码器的视频
我们建议您观看有关编码器的视频
we recommend taking a look at the video regarding encoders.

6
Expand All @@ -35,7 +35,7 @@ One can use a decoder

8
00:00:21,210 --> 00:00:23,760
对于大多数与编码器相同的任务
执行与编码器相同的大多数任务
for most of the same tasks as an encoder

9
Expand All @@ -55,12 +55,12 @@ with the encoder to try

12
00:00:30,300 --> 00:00:32,670
并了解架构差异
并了解在编码器和解码器之间
and understand the architectural differences

13
00:00:32,670 --> 00:00:34,803
在编码器和解码器之间
的架构差异
between an encoder and decoder.

14
Expand All @@ -70,12 +70,12 @@ We'll use a small example using three words.

15
00:00:38,910 --> 00:00:41,050
我们通过他们的解码器传递它们
我们通过解码器传递它们
We pass them through their decoder.

16
00:00:41,050 --> 00:00:44,793
我们检索每个单词的数字表示
我们检索每个单词的数值表示
We retrieve a numerical representation for each word.

17
Expand All @@ -85,17 +85,17 @@ Here for example, the decoder converts the three words.

18
00:00:49,350 --> 00:00:53,545
欢迎来到纽约,欢迎来到这三个数字序列
Welcome to NYC,这三个数字序列
Welcome to NYC, and these three sequences of numbers.

19
00:00:53,545 --> 00:00:56,040
解码器只输出一个序列
解码器针对每个输入词汇
The decoder outputs exactly one sequence

20
00:00:56,040 --> 00:00:58,740
每个输入词的数字
只输出一个数列
of numbers per input word.

21
Expand All @@ -105,7 +105,7 @@ This numerical representation can also

22
00:01:00,630 --> 00:01:03,783
称为特征向量或特征传感器
称为特征向量(feature vector)或特征传感器(feature sensor)
be called a feature vector or a feature sensor.

23
Expand All @@ -115,32 +115,32 @@ Let's dive in this representation.

24
00:01:07,200 --> 00:01:08,490
它包含一个向量
它包含了每个通过解码器
It contains one vector

25
00:01:08,490 --> 00:01:11,340
每个通过解码器的单词
传递的单词的一个向量
per word that was passed through the decoder.

26
00:01:11,340 --> 00:01:14,250
这些向量中的每一个都是一个数字表示
这些向量中的每一个单词
Each of these vectors is a numerical representation

27
00:01:14,250 --> 00:01:15,573
有问题的词
都是一个数值表示
of the word in question.

28
00:01:16,920 --> 00:01:18,562
该向量的维度被定义
这个向量的维度
The dimension of that vector is defined

29
00:01:18,562 --> 00:01:20,703
通过模型的架构
由模型的架构所决定
by the architecture of the model.

30
Expand All @@ -150,12 +150,12 @@ Where the decoder differs from the encoder is principally

31
00:01:26,040 --> 00:01:28,200
具有自我注意机制
具有自注意力机制
with its self attention mechanism.

32
00:01:28,200 --> 00:01:30,843
它使用所谓的掩蔽自我关注
它使用所谓的掩蔽自注意力
It's using what is called masked self attention.

33
Expand All @@ -165,27 +165,27 @@ Here, for example, if we focus on the word "to"

34
00:01:34,650 --> 00:01:37,620
我们会看到 vector 是绝对未修改的
我们会发现它的向量
we'll see that is vector is absolutely unmodified

35
00:01:37,620 --> 00:01:39,690
用纽约的话来说
完全未被 NYC 单词修改
by the NYC word.

36
00:01:39,690 --> 00:01:41,731
那是因为右边所有的话,也都知道
那是因为右边所有的话,
That's because all the words on the right, also known

37
00:01:41,731 --> 00:01:45,276
因为这个词的正确上下文被掩盖了
单词的右侧上下文都被屏蔽了
as the right context of the word is masked rather

38
00:01:45,276 --> 00:01:49,230
而不是受益于左右所有的话
而没有从左侧和右侧的所有单词中受益
than benefiting from all the words on the left and right.

39
Expand All @@ -205,32 +205,32 @@ which can be the left context or the right context.

42
00:01:59,539 --> 00:02:03,356
Masked self attention 机制不同
掩蔽自注意力机制不同于
The masked self attention mechanism differs

43
00:02:03,356 --> 00:02:04,320
来自 self attention 机制
自注意力机制
from the self attention mechanism

44
00:02:04,320 --> 00:02:07,110
通过使用额外的掩码来隐藏上下文
通过使用额外的掩码在单词的两边
by using an additional mask to hide the context

45
00:02:07,110 --> 00:02:09,390
在单词的两边
来隐藏上下文
on either side of the word

46
00:02:09,390 --> 00:02:12,810
单词数值表示不会受到影响
通过隐藏上下文中的单词
the words numerical representation will not be affected

47
00:02:12,810 --> 00:02:14,853
通过隐藏上下文中的单词
单词数值表示不会受到影响
by the words in the hidden context.

48
Expand All @@ -245,7 +245,7 @@ Decoders like encoders can be used as standalone models

50
00:02:22,380 --> 00:02:25,020
因为它们生成数字表示
因为它们生成数值表示
as they generate a numerical representation.

51
Expand All @@ -265,42 +265,42 @@ A word can only have access to its left context

54
00:02:34,530 --> 00:02:36,690
只能访问他们的左上下文
因为它只有左侧的上下文信息
having only access to their left context.

55
00:02:36,690 --> 00:02:39,120
他们天生擅长文本生成
它们天生擅长文本生成
They're inherently good at text generation

56
00:02:39,120 --> 00:02:41,010
生成单词的能力
即在已知的词序列基础上生成一个单词
the ability to generate a word

57
00:02:41,010 --> 00:02:45,000
或给定已知单词序列的单词序列
或单词序列的能力
or a sequence of words given a known sequence of words.

58
00:02:45,000 --> 00:02:45,833
这是众所周知的
这被称为
This is known

59
00:02:45,833 --> 00:02:49,083
作为因果语言建模或自然语言生成
因果语言建模或自然语言生成
as causal language modeling or natural language generation.

60
00:02:50,430 --> 00:02:53,520
这是因果语言建模如何工作的示例
下面是一个展示因果语言模型的工作原理的示例
Here's an example of how causal language modeling works.

61
00:02:53,520 --> 00:02:56,410
我们从一个词开始,这是我的
我们从一个词 my 开始,
We start with an initial word, which is my

62
Expand All @@ -320,52 +320,52 @@ and this vector contains information about the sequence

65
00:03:07,230 --> 00:03:08,733
这是一个词
这里的序列是一个单词
which is here a single word.

66
00:03:09,780 --> 00:03:11,430
我们应用一个小的转换
我们对该向量
We apply a small transformation

67
00:03:11,430 --> 00:03:13,110
到那个向量,以便它映射
应用一个小的转换
to that vector so that it maps

68
00:03:13,110 --> 00:03:16,500
到模型已知的所有单词,这是一个映射
使其映射到模型已知的所有单词
to all the words known by the model, which is a mapping

69
00:03:16,500 --> 00:03:19,890
我们稍后会看到称为语言建模头。
这个映射我们稍后会看到,称为语言模型头部信息
that we'll see later called a language modeling head.

70
00:03:19,890 --> 00:03:21,930
我们确定该模型相信
我们发现模型认为
We identify that the model believes

71
00:03:21,930 --> 00:03:25,053
最有可能的后续单词是 name。
接下来最有可能的单词是 “name
that the most probable following word is name.

72
00:03:26,250 --> 00:03:28,710
然后我们取那个新词并添加它
然后我们把这个新单词加到原始的序列 my 后面
We then take that new word and add it

73
00:03:28,710 --> 00:03:33,480
到我的初始序列,我们现在以我的名字命名
我们得到了 my name
to the initial sequence from my, we are now at my name.

74
00:03:33,480 --> 00:03:36,870
这就是自回归方面的用武之地
这就是自回归(auto-regressive)的作用所在
This is where the auto regressive aspect comes in.

75
Expand All @@ -375,7 +375,7 @@ Auto regressive models.

76
00:03:38,490 --> 00:03:42,513
我们使用他们过去的输出作为输入和以下步骤
我们使用它们过去的输出作为输入和接下来的步骤
We use their past outputs as inputs and the following steps.

77
Expand All @@ -395,7 +395,7 @@ and retrieve the most probable following word.

80
00:03:52,978 --> 00:03:57,978
本例中就是 “” 这个词,我们重复操作
本例中就是 “is” 这个词,我们重复操作
In this case, it is the word "is", we repeat the operation

81
Expand All @@ -410,13 +410,13 @@ We've now generated a full sentence.

83
00:04:04,590 --> 00:04:07,890
我们决定就此打住,但我们可以继续一段时间
我们决定就此打住,但我们也可以继续一段时间
We decide to stop there, but we could continue for a while.

84
00:04:07,890 --> 00:04:12,890
例如,GPT 2 的最大上下文大小为 1,024。
GPT two, for example, has a maximum context size of 1,024.
例如,GPT-2 的最大上下文大小为 1,024。
GPT-2, for example, has a maximum context size of 1,024.

85
00:04:13,170 --> 00:04:16,830
Expand All @@ -425,11 +425,11 @@ We could eventually generate up to a 1,024 words

86
00:04:16,830 --> 00:04:19,050
并且解码器仍然会有一些记忆
并且解码器仍然会对这个序列的前几个单词
and the decoder would still have some memory

87
00:04:19,050 --> 00:04:21,003
这个序列中的第一个单词
有一些记忆
of the first words in this sequence.

0 comments on commit df3e0cf

Please sign in to comment.