From 5bf5512397f2acc95bdb8f7e2e8cb49dc4d66f84 Mon Sep 17 00:00:00 2001
From: FYJNEVERFOLLOWS <FYJNEVERFOLLOWS@163.com>
Date: Wed, 1 Mar 2023 21:47:50 +0800
Subject: [PATCH 1/2] docs(zh-cn): Reviewed No. 29 - Write your training loop
 in PyTorch

---
 ...29_write-your-training-loop-in-pytorch.srt | 80 +++++++++----------
 1 file changed, 40 insertions(+), 40 deletions(-)

diff --git a/subtitles/zh-CN/29_write-your-training-loop-in-pytorch.srt b/subtitles/zh-CN/29_write-your-training-loop-in-pytorch.srt
index 732483f13..8d956c8b9 100644
--- a/subtitles/zh-CN/29_write-your-training-loop-in-pytorch.srt
+++ b/subtitles/zh-CN/29_write-your-training-loop-in-pytorch.srt
@@ -15,8 +15,8 @@
 
 4
 00:00:05,460 --> 00:00:08,486
-- 使用 PyTorch 编写你自己的训练循环。
-- Write your own training loop with PyTorch.
+使用 PyTorch 编写你自己的训练循环。
+Write your own training loop with PyTorch.
 
 5
 00:00:08,486 --> 00:00:09,960
@@ -25,7 +25,7 @@ In this video, we'll look at
 
 6
 00:00:09,960 --> 00:00:12,750
-我们如何进行与培训师视频中相同的微调，
+我们如何进行与训练器视频中相同的微调，
 how we can do the same fine-tuning as in the Trainer video,
 
 7
@@ -35,12 +35,12 @@ but without relying on that class.
 
 8
 00:00:14,760 --> 00:00:17,790
-这样，你就可以轻松自定义每个步骤
+这样，你就可以根据你的需要轻松自定义
 This way, you'll be able to easily customize each step
 
 9
 00:00:17,790 --> 00:00:20,310
-到你需要的训练循环。
+训练循环的每个步骤。
 to the training loop to your needs.
 
 10
@@ -50,12 +50,12 @@ This is also very useful
 
 11
 00:00:21,660 --> 00:00:22,740
-手动调试某些东西
+对于手动调试
 to manually debug something
 
 12
 00:00:22,740 --> 00:00:24,590
-Trainer API 出了问题。
+Trainer API 出现的问题。
 that went wrong with the Trainer API.
 
 13
@@ -85,8 +85,8 @@ That number is not useful in its own,
 
 18
 00:00:39,316 --> 00:00:40,260
-用于计算
-that is used to compute
+它是用于计算
+but is used to compute
 
 19
 00:00:40,260 --> 00:00:42,150
@@ -95,12 +95,12 @@ the ingredients of our model weights,
 
 20
 00:00:42,150 --> 00:00:43,440
-那是损失的导数
+即，损失关于
 that is the derivative of the loss
 
 21
 00:00:44,610 --> 00:00:47,160
-关于每个模型的重量。
+每个模型权重的导数。
 with respect to each model weight.
 
 22
@@ -125,7 +125,7 @@ We then repeat the process
 
 26
 00:00:54,510 --> 00:00:56,880
-与一批新的训练数据。
+使用一批新的训练数据。
 with a new batch of training data.
 
 27
@@ -165,7 +165,7 @@ Check out the videos link below
 
 34
 00:01:12,630 --> 00:01:14,280
-如果你还没有看到它们。
+如果你还没有看过它们。
 if you haven't seen them already.
 
 35
@@ -180,8 +180,8 @@ which will be responsible to convert
 
 37
 00:01:20,610 --> 00:01:23,253
-我们数据集的元素到补丁中。
-the elements of our dataset into patches.
+我们数据集的元素到批次数据中。
+the elements of our dataset into batches.
 
 38
 00:01:24,450 --> 00:01:27,960
@@ -190,17 +190,17 @@ We use our DataColletorForPadding as a collate function,
 
 39
 00:01:27,960 --> 00:01:29,460
-并洗牌训练集
+并打乱训练集的次序
 and shuffle the training set
 
 40
 00:01:29,460 --> 00:01:31,080
-确保我们不会检查样品
+确保我们不会每个纪元
 to make sure we don't go over the samples
 
 41
 00:01:31,080 --> 00:01:33,870
-在一个时代 * 以相同的顺序。
+以相同的顺序遍历样本。
 in the same order at a epoch*.
 
 42
@@ -216,16 +216,16 @@ we try to grab a batch of data, and inspect it.
 44
 00:01:40,080 --> 00:01:43,050
 就像我们的数据集元素一样，它是一个字典，
-Like our data set elements, it's a dictionary,
+Like our dataset elements, it's a dictionary,
 
 45
 00:01:43,050 --> 00:01:46,260
-但这些时候值不是一个整数列表
-but these times the values are not a single list of integers
+但这里的值不是一个整数列表
+but this times the values are not a single list of integers
 
 46
 00:01:46,260 --> 00:01:49,053
-但是按序列长度形状批量大小的张量。
+而是形状为批量大小乘以序列长度的张量。
 but a tensor of shape batch size by sequence length.
 
 47
@@ -240,7 +240,7 @@ For that, we'll need to actually create a model.
 
 49
 00:01:56,730 --> 00:01:58,740
-如 Model API 视频中所示，
+如模型 API 视频中所示，
 As seen in the Model API video,
 
 50
@@ -250,12 +250,12 @@ we use the from_pretrained method,
 
 51
 00:02:00,540 --> 00:02:03,270
-并将标签数量调整为类别数量
+并将标签数量调整为这个数据集
 and adjust the number of labels to the number of classes
 
 52
 00:02:03,270 --> 00:02:06,810
-我们有这个数据集，这里有两个。
+拥有的类别数量，这里是 2。
 we have on this data set, here two.
 
 53
@@ -275,7 +275,7 @@ and check there is no error.
 
 56
 00:02:13,320 --> 00:02:14,940
-如果提供标签，
+如果提供了标签，
 If the labels are provided,
 
 57
@@ -290,12 +290,12 @@ always returns a loss directly.
 
 59
 00:02:19,525 --> 00:02:21,090
-我们将能够做 loss.backward ()
+我们将能够做损失的反向传播
 We will be able to do loss.backward ()
 
 60
 00:02:21,090 --> 00:02:22,860
-计算所有梯度，
+以计算所有梯度，
 to compute all the gradients,
 
 61
@@ -325,17 +325,17 @@ Using the previous loss,
 
 66
 00:02:36,150 --> 00:02:39,060
-并使用 loss.backward () 计算梯度，
+并使用 loss.backward() 计算梯度，
 and computing the gradients with loss.backward (),
 
 67
 00:02:39,060 --> 00:02:41,130
-我们检查我们是否可以执行优化器步骤
+我们检查我们是否可以无误
 we check that we can do the optimizer step
 
 68
 00:02:41,130 --> 00:02:42,030
-没有任何错误。
+执行优化器步骤。
 without any error.
 
 69
@@ -360,7 +360,7 @@ We could already write our training loop,
 
 73
 00:02:52,080 --> 00:02:53,220
-但我们还要添加两件事
+但我们还要再做两件事
 but we will add two more things
 
 74
@@ -390,7 +390,7 @@ is just a convenience function
 
 79
 00:03:06,150 --> 00:03:07,800
-轻松构建这样的调度程序。
+轻松构建这样的调度器。
 to easily build such a scheduler.
 
 80
@@ -400,7 +400,7 @@ You can again use
 
 81
 00:03:09,683 --> 00:03:11,860
-取而代之的是任何 PyTorch 学习率调度程序。
+取而代之的是任何 PyTorch 学习率调度器。
 any PyTorch learning rate scheduler instead.
 
 82
@@ -426,7 +426,7 @@ The first step is to get one,
 86
 00:03:21,270 --> 00:03:23,283
 例如通过使用协作笔记本。
-for instance by using a collab notebook.
+for instance by using a colab notebook.
 
 87
 00:03:24,180 --> 00:03:26,040
@@ -435,12 +435,12 @@ Then you need to actually send your model,
 
 88
 00:03:26,040 --> 00:03:28,923
-并使用火炬设备对其进行训练数据。
+并使用 torch 设备对其进行训练数据。
 and training data on it by using a torch device.
 
 89
 00:03:29,790 --> 00:03:30,840
-仔细检查以下行
+仔细检查以下代码
 Double-check the following lines
 
 90
@@ -560,12 +560,12 @@ then go through all the data in the evaluation data loader.
 
 113
 00:04:23,850 --> 00:04:25,530
-正如我们在培训师视频中看到的那样，
+正如我们在训练器视频中看到的那样，
 As we have seen in the Trainer video,
 
 114
 00:04:25,530 --> 00:04:26,850
-模型输出 logits，
+模型输出逻辑斯蒂，
 the model outputs logits,
 
 115
@@ -610,7 +610,7 @@ Congratulations, you have now fine-tuned a model
 
 123
 00:04:44,490 --> 00:04:45,633
-靠你自己。
+全靠你自己。
 all by yourself.
 
 124

From 722dd7eff1ff75fae72c3eb74e6c9a4c9574ea5e Mon Sep 17 00:00:00 2001
From: FYJNEVERFOLLOWS <FYJNEVERFOLLOWS@163.com>
Date: Mon, 13 Mar 2023 18:37:21 +0800
Subject: [PATCH 2/2] translated

---
 ...preprocessing-sentence-pairs-(pytorch).srt | 54 +++++++++----------
 1 file changed, 27 insertions(+), 27 deletions(-)

diff --git a/subtitles/zh-CN/21_preprocessing-sentence-pairs-(pytorch).srt b/subtitles/zh-CN/21_preprocessing-sentence-pairs-(pytorch).srt
index 8541dbc5e..32a1c78c4 100644
--- a/subtitles/zh-CN/21_preprocessing-sentence-pairs-(pytorch).srt
+++ b/subtitles/zh-CN/21_preprocessing-sentence-pairs-(pytorch).srt
@@ -20,7 +20,7 @@ and batch them together in the,
 
 5
 00:00:12,877 --> 00:00:15,810
-“批量输入视频。”
+在“批量输入”视频中。
 "Batching inputs together video."
 
 6
@@ -30,7 +30,7 @@ If this code look unfamiliar to you,
 
 7
 00:00:18,330 --> 00:00:20,030
-请务必再次检查该视频。
+请务必再次查看该视频。
 be sure to check that video again.
 
 8
@@ -40,12 +40,12 @@ Here will focus on tasks that classify pair of sentences.
 
 9
 00:00:25,620 --> 00:00:28,470
-例如，我们可能想要对两个文本进行分类
+例如，我们可能想要对两个文本是否被释义
 For instance, we may want to classify whether two texts
 
 10
 00:00:28,470 --> 00:00:30,360
-是否被释义。
+进行分类。
 are paraphrased or not.
 
 11
@@ -90,7 +90,7 @@ a problem called natural language inference or NLI.
 
 19
 00:00:53,970 --> 00:00:57,000
-在这个例子中，取自 MultiNLI 数据集，
+在这个取自 MultiNLI 数据集的例子中，
 In this example, taken from the MultiNLI data set,
 
 20
@@ -100,7 +100,7 @@ we have a pair of sentences for each possible label.
 
 21
 00:00:59,880 --> 00:01:02,490
-矛盾，自然的或必然的，
+矛盾，自然的或蕴涵，
 Contradiction, natural or entailment,
 
 22
@@ -115,12 +115,12 @@ implies the second.
 
 24
 00:01:06,930 --> 00:01:08,820
-所以分类成对的句子是一个问题
+所以分类成对的句子是一个
 So classifying pairs of sentences is a problem
 
 25
 00:01:08,820 --> 00:01:10,260
-值得研究。
+值得研究的问题。
 worth studying.
 
 26
@@ -165,7 +165,7 @@ they often have an objective related to sentence pairs.
 
 34
 00:01:31,230 --> 00:01:34,320
-例如，在预训练期间显示 BERT
+例如，在预训练期间 BERT 见到
 For instance, during pretraining BERT is shown
 
 35
@@ -175,12 +175,12 @@ pairs of sentences and must predict both
 
 36
 00:01:36,810 --> 00:01:39,930
-随机屏蔽令牌的价值，以及第二个是否
+随机掩蔽的标记值，以及第二个是否
 the value of randomly masked tokens, and whether the second
 
 37
 00:01:39,930 --> 00:01:41,830
-句子是否从头开始。
+句子是否接着第一个句子。
 sentence follow from the first or not.
 
 38
@@ -205,27 +205,27 @@ to the tokenizer.
 
 42
 00:01:53,430 --> 00:01:55,470
-在输入 ID 和注意掩码之上
+在我们已经研究过的输入 ID 
 On top of the input IDs and the attention mask
 
 43
 00:01:55,470 --> 00:01:56,970
-我们已经研究过，
+和注意掩码之上，
 we studied already,
 
 44
 00:01:56,970 --> 00:01:59,910
-它返回一个名为令牌类型 ID 的新字段，
+它返回一个名为标记类型 ID 的新字段，
 it returns a new field called token type IDs,
 
 45
 00:01:59,910 --> 00:02:01,790
-它告诉模型哪些令牌属于
+它告诉模型哪些标记属于
 which tells the model which tokens belong
 
 46
 00:02:01,790 --> 00:02:03,630
-对于第一句话，
+第一句话，
 to the first sentence,
 
 47
@@ -245,12 +245,12 @@ aligned with the tokens they correspond to,
 
 50
 00:02:12,180 --> 00:02:15,213
-它们各自的令牌类型 ID 和注意掩码。
+它们各自的标记类型 ID 和注意掩码。
 their respective token type ID and attention mask.
 
 51
 00:02:16,080 --> 00:02:19,260
-我们可以看到标记器还添加了特殊标记。
+我们可以看到分词器还添加了特殊标记。
 We can see the tokenizer also added special tokens.
 
 52
@@ -260,12 +260,12 @@ So we have a CLS token, the tokens from the first sentence,
 
 53
 00:02:22,620 --> 00:02:25,770
-一个 SEP 令牌，第二句话中的令牌，
+一个 SEP 标记，第二句话中的标记，
 a SEP token, the tokens from the second sentence,
 
 54
 00:02:25,770 --> 00:02:27,003
-和最终的 SEP 令牌。
+和最终的 SEP 标记。
 and a final SEP token.
 
 55
@@ -275,12 +275,12 @@ If we have several pairs of sentences,
 
 56
 00:02:30,570 --> 00:02:32,840
-我们可以通过传递列表将它们标记在一起
+我们可以通过第一句话的传递列表
 we can tokenize them together by passing the list
 
 57
 00:02:32,840 --> 00:02:36,630
-第一句话，然后是第二句话的列表
+将它们标记在一起，然后是第二句话的列表
 of first sentences, then the list of second sentences
 
 58
@@ -290,7 +290,7 @@ and all the keyword arguments we studied already
 
 59
 00:02:39,300 --> 00:02:40,353
-像填充 = 真。
+例如 padding=True。
 like padding=True.
 
 60
@@ -300,17 +300,17 @@ Zooming in at the result,
 
 61
 00:02:43,140 --> 00:02:45,030
-我们还可以看到标记化添加的填充
-we can see also tokenize added padding
+我们可以看到分词器如何添加填充
+we can see how the tokenizer added padding
 
 62
 00:02:45,030 --> 00:02:48,090
-到第二对句子来制作两个输出
+到第二对句子使得两个输出的
 to the second pair sentences to make the two outputs
 
 63
 00:02:48,090 --> 00:02:51,360
-相同的长度，并正确处理令牌类型 ID
+长度相同，并正确处理标记类型 ID
 the same length, and properly dealt with token type IDs
 
 64