huggingface
diff --git a/‎subtitles/zh-CN/21_preprocessing-sentence-pairs-(pytorch).srt‎
Lines changed: 27 additions & 27 deletions b/‎subtitles/zh-CN/21_preprocessing-sentence-pairs-(pytorch).srt‎
Lines changed: 27 additions & 27 deletions
@@ -30,7 +30,7 @@ If this code look unfamiliar to you,
 
 7
 00:00:18,330 --> 00:00:20,030
-请务必再次检查该视频。
+请务必再次查看该视频。
 be sure to check that video again.
 
 8
@@ -40,12 +40,12 @@ Here will focus on tasks that classify pair of sentences.
 
 9
 00:00:25,620 --> 00:00:28,470
-例如，我们可能想要对两个文本进行分类
+例如，我们可能想要对两个文本是否被释义
 For instance, we may want to classify whether two texts
 
 10
 00:00:28,470 --> 00:00:30,360
-是否被释义。
+进行分类。
 are paraphrased or not.
 
 11
@@ -90,8 +90,8 @@ a problem called natural language inference or NLI.
 
 19
 00:00:53,970 --> 00:00:57,000
-在这个例子中，取自 MultiNLI 数据集，
-In this example, taken from the MultiNLI dataset,
+在这个取自 MultiNLI 数据集的例子中，
+In this example, taken from the MultiNLI data set,
 
 20
 00:00:57,000 --> 00:00:59,880
@@ -100,7 +100,7 @@ we have a pair of sentences for each possible label.
 
 21
 00:00:59,880 --> 00:01:02,490
-矛盾，自然的或必然的，
+矛盾，自然的或蕴涵，
 Contradiction, natural or entailment,
 
 22
@@ -115,12 +115,12 @@ implies the second.
 
 24
 00:01:06,930 --> 00:01:08,820
-所以分类成对的句子是一个问题
+所以分类成对的句子是一个
 So classifying pairs of sentences is a problem
 
 25
 00:01:08,820 --> 00:01:10,260
-值得被研究。
+值得研究的问题。
 worth studying.
 
 26
@@ -165,7 +165,7 @@ they often have an objective related to sentence pairs.
 
 34
 00:01:31,230 --> 00:01:34,320
-例如，在预训练期间 BERT 显示
+例如，在预训练期间 BERT 见到
 For instance, during pretraining BERT is shown
 
 35
@@ -175,12 +175,12 @@ pairs of sentences and must predict both
 
 36
 00:01:36,810 --> 00:01:39,930
-随机屏蔽 token 的价值，以及是否第二个
+随机掩蔽的标记值，以及第二个是否
 the value of randomly masked tokens, and whether the second
 
 37
 00:01:39,930 --> 00:01:41,830
-句子从第一个开始, 或反之。
+句子是否接着第一个句子。
 sentence follow from the first or not.
 
 38
@@ -205,27 +205,27 @@ to the tokenizer.
 
 42
 00:01:53,430 --> 00:01:55,470
-在输入 ID 和注意力掩码之上
+在我们已经研究过的输入 ID 
 On top of the input IDs and the attention mask
 
 43
 00:01:55,470 --> 00:01:56,970
-我们已经研究过，
+和注意掩码之上，
 we studied already,
 
 44
 00:01:56,970 --> 00:01:59,910
-它返回一个名为 token 类型 ID 的新字段，
+它返回一个名为标记类型 ID 的新字段，
 it returns a new field called token type IDs,
 
 45
 00:01:59,910 --> 00:02:01,790
-它告诉模型哪些 token 属于
+它告诉模型哪些标记属于
 which tells the model which tokens belong
 
 46
 00:02:01,790 --> 00:02:03,630
-对于第一句话，
+第一句话，
 to the first sentence,
 
 47
@@ -245,12 +245,12 @@ aligned with the tokens they correspond to,
 
 50
 00:02:12,180 --> 00:02:15,213
-它们各自的 token 类型 ID 和注意掩码。
+它们各自的标记类型 ID 和注意掩码。
 their respective token type ID and attention mask.
 
 51
 00:02:16,080 --> 00:02:19,260
-我们可以看到 tokenizer 还添加了特殊 token 。
+我们可以看到分词器还添加了特殊标记。
 We can see the tokenizer also added special tokens.
 
 52
@@ -260,12 +260,12 @@ So we have a CLS token, the tokens from the first sentence,
 
 53
 00:02:22,620 --> 00:02:25,770
-一个 SEP token ，第二句话中的 token ， 
+一个 SEP 标记，第二句话中的标记，
 a SEP token, the tokens from the second sentence,
 
 54
 00:02:25,770 --> 00:02:27,003
-和最终的 SEP token 。
+和最终的 SEP 标记。
 and a final SEP token.
 
 55
@@ -275,12 +275,12 @@ If we have several pairs of sentences,
 
 56
 00:02:30,570 --> 00:02:32,840
-我们可以通过传递列表将它们标记在一起
+我们可以通过第一句话的传递列表
 we can tokenize them together by passing the list
 
 57
 00:02:32,840 --> 00:02:36,630
-第一句话，然后是第二句话的列表
+将它们标记在一起，然后是第二句话的列表
 of first sentences, then the list of second sentences
 
 58
@@ -290,7 +290,7 @@ and all the keyword arguments we studied already
 
 59
 00:02:39,300 --> 00:02:40,353
-像 padding=True 。
+例如 padding=True。
 like padding=True.
 
 60
@@ -300,17 +300,17 @@ Zooming in at the result,
 
 61
 00:02:43,140 --> 00:02:45,030
-我们还可以看到标记化添加的填充
-we can see also tokenize added padding
+我们可以看到分词器如何添加填充
+we can see how the tokenizer added padding
 
 62
 00:02:45,030 --> 00:02:48,090
-到第二对句子来制作两个输出
+到第二对句子使得两个输出的
 to the second pair sentences to make the two outputs
 
 63
 00:02:48,090 --> 00:02:51,360
-相同的长度，并正确处理 token 类型 ID
+长度相同，并正确处理标记类型 ID
 the same length, and properly dealt with token type IDs
 
 64
Original file line number	Diff line number	Diff line change
`@@ -30,7 +30,7 @@ If this code look unfamiliar to you,`
`30`	`30`
`31`	`31`	`7`
`32`	`32`	`00:00:18,330 --> 00:00:20,030`
`33`		`-请务必再次检查该视频。`
	`33`	`+请务必再次查看该视频。`
`34`	`34`	`be sure to check that video again.`
`35`	`35`
`36`	`36`	`8`
`@@ -40,12 +40,12 @@ Here will focus on tasks that classify pair of sentences.`
`40`	`40`
`41`	`41`	`9`
`42`	`42`	`00:00:25,620 --> 00:00:28,470`
`43`		`-例如，我们可能想要对两个文本进行分类`
	`43`	`+例如，我们可能想要对两个文本是否被释义`
`44`	`44`	`For instance, we may want to classify whether two texts`
`45`	`45`
`46`	`46`	`10`
`47`	`47`	`00:00:28,470 --> 00:00:30,360`
`48`		`-是否被释义。`
	`48`	`+进行分类。`
`49`	`49`	`are paraphrased or not.`
`50`	`50`
`51`	`51`	`11`
`@@ -90,8 +90,8 @@ a problem called natural language inference or NLI.`
`90`	`90`
`91`	`91`	`19`
`92`	`92`	`00:00:53,970 --> 00:00:57,000`
`93`		`-在这个例子中，取自 MultiNLI 数据集，`
`94`		`-In this example, taken from the MultiNLI dataset,`
	`93`	`+在这个取自 MultiNLI 数据集的例子中，`
	`94`	`+In this example, taken from the MultiNLI data set,`
`95`	`95`
`96`	`96`	`20`
`97`	`97`	`00:00:57,000 --> 00:00:59,880`
`@@ -100,7 +100,7 @@ we have a pair of sentences for each possible label.`
`100`	`100`
`101`	`101`	`21`
`102`	`102`	`00:00:59,880 --> 00:01:02,490`
`103`		`-矛盾，自然的或必然的，`
	`103`	`+矛盾，自然的或蕴涵，`
`104`	`104`	`Contradiction, natural or entailment,`
`105`	`105`
`106`	`106`	`22`
`@@ -115,12 +115,12 @@ implies the second.`
`115`	`115`
`116`	`116`	`24`
`117`	`117`	`00:01:06,930 --> 00:01:08,820`
`118`		`-所以分类成对的句子是一个问题`
	`118`	`+所以分类成对的句子是一个`
`119`	`119`	`So classifying pairs of sentences is a problem`
`120`	`120`
`121`	`121`	`25`
`122`	`122`	`00:01:08,820 --> 00:01:10,260`
`123`		`-值得被研究。`
	`123`	`+值得研究的问题。`
`124`	`124`	`worth studying.`
`125`	`125`
`126`	`126`	`26`
`@@ -165,7 +165,7 @@ they often have an objective related to sentence pairs.`
`165`	`165`
`166`	`166`	`34`
`167`	`167`	`00:01:31,230 --> 00:01:34,320`
`168`		`-例如，在预训练期间 BERT 显示`
	`168`	`+例如，在预训练期间 BERT 见到`
`169`	`169`	`For instance, during pretraining BERT is shown`
`170`	`170`
`171`	`171`	`35`
`@@ -175,12 +175,12 @@ pairs of sentences and must predict both`
`175`	`175`
`176`	`176`	`36`
`177`	`177`	`00:01:36,810 --> 00:01:39,930`
`178`		`-随机屏蔽 token 的价值，以及是否第二个`
	`178`	`+随机掩蔽的标记值，以及第二个是否`
`179`	`179`	`the value of randomly masked tokens, and whether the second`
`180`	`180`
`181`	`181`	`37`
`182`	`182`	`00:01:39,930 --> 00:01:41,830`
`183`		`-句子从第一个开始, 或反之。`
	`183`	`+句子是否接着第一个句子。`
`184`	`184`	`sentence follow from the first or not.`
`185`	`185`
`186`	`186`	`38`
`@@ -205,27 +205,27 @@ to the tokenizer.`
`205`	`205`
`206`	`206`	`42`
`207`	`207`	`00:01:53,430 --> 00:01:55,470`
`208`		`-在输入 ID 和注意力掩码之上`
	`208`	`+在我们已经研究过的输入 ID`
`209`	`209`	`On top of the input IDs and the attention mask`
`210`	`210`
`211`	`211`	`43`
`212`	`212`	`00:01:55,470 --> 00:01:56,970`
`213`		`-我们已经研究过，`
	`213`	`+和注意掩码之上，`
`214`	`214`	`we studied already,`
`215`	`215`
`216`	`216`	`44`
`217`	`217`	`00:01:56,970 --> 00:01:59,910`
`218`		`-它返回一个名为 token 类型 ID 的新字段，`
	`218`	`+它返回一个名为标记类型 ID 的新字段，`
`219`	`219`	`it returns a new field called token type IDs,`
`220`	`220`
`221`	`221`	`45`
`222`	`222`	`00:01:59,910 --> 00:02:01,790`
`223`		`-它告诉模型哪些 token 属于`
	`223`	`+它告诉模型哪些标记属于`
`224`	`224`	`which tells the model which tokens belong`
`225`	`225`
`226`	`226`	`46`
`227`	`227`	`00:02:01,790 --> 00:02:03,630`
`228`		`-对于第一句话，`
	`228`	`+第一句话，`
`229`	`229`	`to the first sentence,`
`230`	`230`
`231`	`231`	`47`
`@@ -245,12 +245,12 @@ aligned with the tokens they correspond to,`
`245`	`245`
`246`	`246`	`50`
`247`	`247`	`00:02:12,180 --> 00:02:15,213`
`248`		`-它们各自的 token 类型 ID 和注意掩码。`
	`248`	`+它们各自的标记类型 ID 和注意掩码。`
`249`	`249`	`their respective token type ID and attention mask.`
`250`	`250`
`251`	`251`	`51`
`252`	`252`	`00:02:16,080 --> 00:02:19,260`
`253`		`-我们可以看到 tokenizer 还添加了特殊 token 。`
	`253`	`+我们可以看到分词器还添加了特殊标记。`
`254`	`254`	`We can see the tokenizer also added special tokens.`
`255`	`255`
`256`	`256`	`52`
`@@ -260,12 +260,12 @@ So we have a CLS token, the tokens from the first sentence,`
`260`	`260`
`261`	`261`	`53`
`262`	`262`	`00:02:22,620 --> 00:02:25,770`
`263`		`-一个 SEP token ，第二句话中的 token ，`
	`263`	`+一个 SEP 标记，第二句话中的标记，`
`264`	`264`	`a SEP token, the tokens from the second sentence,`
`265`	`265`
`266`	`266`	`54`
`267`	`267`	`00:02:25,770 --> 00:02:27,003`
`268`		`-和最终的 SEP token 。`
	`268`	`+和最终的 SEP 标记。`
`269`	`269`	`and a final SEP token.`
`270`	`270`
`271`	`271`	`55`
`@@ -275,12 +275,12 @@ If we have several pairs of sentences,`
`275`	`275`
`276`	`276`	`56`
`277`	`277`	`00:02:30,570 --> 00:02:32,840`
`278`		`-我们可以通过传递列表将它们标记在一起`
	`278`	`+我们可以通过第一句话的传递列表`
`279`	`279`	`we can tokenize them together by passing the list`
`280`	`280`
`281`	`281`	`57`
`282`	`282`	`00:02:32,840 --> 00:02:36,630`
`283`		`-第一句话，然后是第二句话的列表`
	`283`	`+将它们标记在一起，然后是第二句话的列表`
`284`	`284`	`of first sentences, then the list of second sentences`
`285`	`285`
`286`	`286`	`58`
`@@ -290,7 +290,7 @@ and all the keyword arguments we studied already`
`290`	`290`
`291`	`291`	`59`
`292`	`292`	`00:02:39,300 --> 00:02:40,353`
`293`		`-像 padding=True 。`
	`293`	`+例如 padding=True。`
`294`	`294`	`like padding=True.`
`295`	`295`
`296`	`296`	`60`
`@@ -300,17 +300,17 @@ Zooming in at the result,`
`300`	`300`
`301`	`301`	`61`
`302`	`302`	`00:02:43,140 --> 00:02:45,030`
`303`		`-我们还可以看到标记化添加的填充`
`304`		`-we can see also tokenize added padding`
	`303`	`+我们可以看到分词器如何添加填充`
	`304`	`+we can see how the tokenizer added padding`
`305`	`305`
`306`	`306`	`62`
`307`	`307`	`00:02:45,030 --> 00:02:48,090`
`308`		`-到第二对句子来制作两个输出`
	`308`	`+到第二对句子使得两个输出的`
`309`	`309`	`to the second pair sentences to make the two outputs`
`310`	`310`
`311`	`311`	`63`
`312`	`312`	`00:02:48,090 --> 00:02:51,360`
`313`		`-相同的长度，并正确处理 token 类型 ID`
	`313`	`+长度相同，并正确处理标记类型 ID`
`314`	`314`	`the same length, and properly dealt with token type IDs`
`315`	`315`
`316`	`316`	`64`