Skip to content

Commit 1db1185

Browse files
authored
Merge pull request #531 from FYJNEVERFOLLOWS/0313
Reviewd No. 21 & 29 Subtitles
2 parents af00e34 + 9254fb6 commit 1db1185

File tree

2 files changed

+67
-67
lines changed

2 files changed

+67
-67
lines changed

subtitles/zh-CN/21_preprocessing-sentence-pairs-(pytorch).srt

Lines changed: 27 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@ If this code look unfamiliar to you,
3030

3131
7
3232
00:00:18,330 --> 00:00:20,030
33-
请务必再次检查该视频
33+
请务必再次查看该视频
3434
be sure to check that video again.
3535

3636
8
@@ -40,12 +40,12 @@ Here will focus on tasks that classify pair of sentences.
4040

4141
9
4242
00:00:25,620 --> 00:00:28,470
43-
例如,我们可能想要对两个文本进行分类
43+
例如,我们可能想要对两个文本是否被释义
4444
For instance, we may want to classify whether two texts
4545

4646
10
4747
00:00:28,470 --> 00:00:30,360
48-
是否被释义
48+
进行分类
4949
are paraphrased or not.
5050

5151
11
@@ -90,8 +90,8 @@ a problem called natural language inference or NLI.
9090

9191
19
9292
00:00:53,970 --> 00:00:57,000
93-
在这个例子中,取自 MultiNLI 数据集
94-
In this example, taken from the MultiNLI dataset,
93+
在这个取自 MultiNLI 数据集的例子中
94+
In this example, taken from the MultiNLI data set,
9595

9696
20
9797
00:00:57,000 --> 00:00:59,880
@@ -100,7 +100,7 @@ we have a pair of sentences for each possible label.
100100

101101
21
102102
00:00:59,880 --> 00:01:02,490
103-
矛盾,自然的或必然的
103+
矛盾,自然的或蕴涵
104104
Contradiction, natural or entailment,
105105

106106
22
@@ -115,12 +115,12 @@ implies the second.
115115

116116
24
117117
00:01:06,930 --> 00:01:08,820
118-
所以分类成对的句子是一个问题
118+
所以分类成对的句子是一个
119119
So classifying pairs of sentences is a problem
120120

121121
25
122122
00:01:08,820 --> 00:01:10,260
123-
值得被研究
123+
值得研究的问题
124124
worth studying.
125125

126126
26
@@ -165,7 +165,7 @@ they often have an objective related to sentence pairs.
165165

166166
34
167167
00:01:31,230 --> 00:01:34,320
168-
例如,在预训练期间 BERT 显示
168+
例如,在预训练期间 BERT 见到
169169
For instance, during pretraining BERT is shown
170170

171171
35
@@ -175,12 +175,12 @@ pairs of sentences and must predict both
175175

176176
36
177177
00:01:36,810 --> 00:01:39,930
178-
随机屏蔽 token 的价值,以及是否第二个
178+
随机掩蔽的标记值,以及第二个是否
179179
the value of randomly masked tokens, and whether the second
180180

181181
37
182182
00:01:39,930 --> 00:01:41,830
183-
句子从第一个开始, 或反之
183+
句子是否接着第一个句子
184184
sentence follow from the first or not.
185185

186186
38
@@ -205,27 +205,27 @@ to the tokenizer.
205205

206206
42
207207
00:01:53,430 --> 00:01:55,470
208-
在输入 ID 和注意力掩码之上
208+
在我们已经研究过的输入 ID
209209
On top of the input IDs and the attention mask
210210

211211
43
212212
00:01:55,470 --> 00:01:56,970
213-
我们已经研究过
213+
和注意掩码之上
214214
we studied already,
215215

216216
44
217217
00:01:56,970 --> 00:01:59,910
218-
它返回一个名为 token 类型 ID 的新字段,
218+
它返回一个名为标记类型 ID 的新字段,
219219
it returns a new field called token type IDs,
220220

221221
45
222222
00:01:59,910 --> 00:02:01,790
223-
它告诉模型哪些 token 属于
223+
它告诉模型哪些标记属于
224224
which tells the model which tokens belong
225225

226226
46
227227
00:02:01,790 --> 00:02:03,630
228-
对于第一句话
228+
第一句话
229229
to the first sentence,
230230

231231
47
@@ -245,12 +245,12 @@ aligned with the tokens they correspond to,
245245

246246
50
247247
00:02:12,180 --> 00:02:15,213
248-
它们各自的 token 类型 ID 和注意掩码。
248+
它们各自的标记类型 ID 和注意掩码。
249249
their respective token type ID and attention mask.
250250

251251
51
252252
00:02:16,080 --> 00:02:19,260
253-
我们可以看到 tokenizer 还添加了特殊 token
253+
我们可以看到分词器还添加了特殊标记
254254
We can see the tokenizer also added special tokens.
255255

256256
52
@@ -260,12 +260,12 @@ So we have a CLS token, the tokens from the first sentence,
260260

261261
53
262262
00:02:22,620 --> 00:02:25,770
263-
一个 SEP token ,第二句话中的 token ,
263+
一个 SEP 标记,第二句话中的标记,
264264
a SEP token, the tokens from the second sentence,
265265

266266
54
267267
00:02:25,770 --> 00:02:27,003
268-
和最终的 SEP token
268+
和最终的 SEP 标记
269269
and a final SEP token.
270270

271271
55
@@ -275,12 +275,12 @@ If we have several pairs of sentences,
275275

276276
56
277277
00:02:30,570 --> 00:02:32,840
278-
我们可以通过传递列表将它们标记在一起
278+
我们可以通过第一句话的传递列表
279279
we can tokenize them together by passing the list
280280

281281
57
282282
00:02:32,840 --> 00:02:36,630
283-
第一句话,然后是第二句话的列表
283+
将它们标记在一起,然后是第二句话的列表
284284
of first sentences, then the list of second sentences
285285

286286
58
@@ -290,7 +290,7 @@ and all the keyword arguments we studied already
290290

291291
59
292292
00:02:39,300 --> 00:02:40,353
293-
padding=True
293+
例如 padding=True。
294294
like padding=True.
295295

296296
60
@@ -300,17 +300,17 @@ Zooming in at the result,
300300

301301
61
302302
00:02:43,140 --> 00:02:45,030
303-
我们还可以看到标记化添加的填充
304-
we can see also tokenize added padding
303+
我们可以看到分词器如何添加填充
304+
we can see how the tokenizer added padding
305305

306306
62
307307
00:02:45,030 --> 00:02:48,090
308-
到第二对句子来制作两个输出
308+
到第二对句子使得两个输出的
309309
to the second pair sentences to make the two outputs
310310

311311
63
312312
00:02:48,090 --> 00:02:51,360
313-
相同的长度,并正确处理 token 类型 ID
313+
长度相同,并正确处理标记类型 ID
314314
the same length, and properly dealt with token type IDs
315315

316316
64

0 commit comments

Comments
 (0)