Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
54 changes: 27 additions & 27 deletions subtitles/zh-CN/21_preprocessing-sentence-pairs-(pytorch).srt
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ If this code look unfamiliar to you,

7
00:00:18,330 --> 00:00:20,030
请务必再次检查该视频
请务必再次查看该视频
be sure to check that video again.

8
Expand All @@ -40,12 +40,12 @@ Here will focus on tasks that classify pair of sentences.

9
00:00:25,620 --> 00:00:28,470
例如,我们可能想要对两个文本进行分类
例如,我们可能想要对两个文本是否被释义
For instance, we may want to classify whether two texts

10
00:00:28,470 --> 00:00:30,360
是否被释义
进行分类
are paraphrased or not.

11
Expand Down Expand Up @@ -90,8 +90,8 @@ a problem called natural language inference or NLI.

19
00:00:53,970 --> 00:00:57,000
在这个例子中,取自 MultiNLI 数据集
In this example, taken from the MultiNLI dataset,
在这个取自 MultiNLI 数据集的例子中
In this example, taken from the MultiNLI data set,

20
00:00:57,000 --> 00:00:59,880
Expand All @@ -100,7 +100,7 @@ we have a pair of sentences for each possible label.

21
00:00:59,880 --> 00:01:02,490
矛盾,自然的或必然的
矛盾,自然的或蕴涵
Contradiction, natural or entailment,

22
Expand All @@ -115,12 +115,12 @@ implies the second.

24
00:01:06,930 --> 00:01:08,820
所以分类成对的句子是一个问题
所以分类成对的句子是一个
So classifying pairs of sentences is a problem

25
00:01:08,820 --> 00:01:10,260
值得被研究
值得研究的问题
worth studying.

26
Expand Down Expand Up @@ -165,7 +165,7 @@ they often have an objective related to sentence pairs.

34
00:01:31,230 --> 00:01:34,320
例如,在预训练期间 BERT 显示
例如,在预训练期间 BERT 见到
For instance, during pretraining BERT is shown

35
Expand All @@ -175,12 +175,12 @@ pairs of sentences and must predict both

36
00:01:36,810 --> 00:01:39,930
随机屏蔽 token 的价值,以及是否第二个
随机掩蔽的标记值,以及第二个是否
the value of randomly masked tokens, and whether the second

37
00:01:39,930 --> 00:01:41,830
句子从第一个开始, 或反之
句子是否接着第一个句子
sentence follow from the first or not.

38
Expand All @@ -205,27 +205,27 @@ to the tokenizer.

42
00:01:53,430 --> 00:01:55,470
在输入 ID 和注意力掩码之上
在我们已经研究过的输入 ID
On top of the input IDs and the attention mask

43
00:01:55,470 --> 00:01:56,970
我们已经研究过
和注意掩码之上
we studied already,

44
00:01:56,970 --> 00:01:59,910
它返回一个名为 token 类型 ID 的新字段,
它返回一个名为标记类型 ID 的新字段,
it returns a new field called token type IDs,

45
00:01:59,910 --> 00:02:01,790
它告诉模型哪些 token 属于
它告诉模型哪些标记属于
which tells the model which tokens belong

46
00:02:01,790 --> 00:02:03,630
对于第一句话
第一句话
to the first sentence,

47
Expand All @@ -245,12 +245,12 @@ aligned with the tokens they correspond to,

50
00:02:12,180 --> 00:02:15,213
它们各自的 token 类型 ID 和注意掩码。
它们各自的标记类型 ID 和注意掩码。
their respective token type ID and attention mask.

51
00:02:16,080 --> 00:02:19,260
我们可以看到 tokenizer 还添加了特殊 token
我们可以看到分词器还添加了特殊标记
We can see the tokenizer also added special tokens.

52
Expand All @@ -260,12 +260,12 @@ So we have a CLS token, the tokens from the first sentence,

53
00:02:22,620 --> 00:02:25,770
一个 SEP token ,第二句话中的 token ,
一个 SEP 标记,第二句话中的标记,
a SEP token, the tokens from the second sentence,

54
00:02:25,770 --> 00:02:27,003
和最终的 SEP token
和最终的 SEP 标记
and a final SEP token.

55
Expand All @@ -275,12 +275,12 @@ If we have several pairs of sentences,

56
00:02:30,570 --> 00:02:32,840
我们可以通过传递列表将它们标记在一起
我们可以通过第一句话的传递列表
we can tokenize them together by passing the list

57
00:02:32,840 --> 00:02:36,630
第一句话,然后是第二句话的列表
将它们标记在一起,然后是第二句话的列表
of first sentences, then the list of second sentences

58
Expand All @@ -290,7 +290,7 @@ and all the keyword arguments we studied already

59
00:02:39,300 --> 00:02:40,353
padding=True
例如 padding=True。
like padding=True.

60
Expand All @@ -300,17 +300,17 @@ Zooming in at the result,

61
00:02:43,140 --> 00:02:45,030
我们还可以看到标记化添加的填充
we can see also tokenize added padding
我们可以看到分词器如何添加填充
we can see how the tokenizer added padding

62
00:02:45,030 --> 00:02:48,090
到第二对句子来制作两个输出
到第二对句子使得两个输出的
to the second pair sentences to make the two outputs

63
00:02:48,090 --> 00:02:51,360
相同的长度,并正确处理 token 类型 ID
长度相同,并正确处理标记类型 ID
the same length, and properly dealt with token type IDs

64
Expand Down
Loading