diff --git a/subtitles/zh-CN/03_what-is-transfer-learning.srt b/subtitles/zh-CN/03_what-is-transfer-learning.srt
index 8a8a0b509..4ddaf9c5d 100644
--- a/subtitles/zh-CN/03_what-is-transfer-learning.srt
+++ b/subtitles/zh-CN/03_what-is-transfer-learning.srt
@@ -5,22 +5,22 @@
 
 2
 00:00:05,550 --> 00:00:07,293
-- 什么是迁移学习？
+- 什么是转移学习？
 - What is transfer learning?
 
 3
 00:00:09,480 --> 00:00:10,920
-迁移学习的思想
+转移学习的思想
 The idea of transfer learning
 
 4
 00:00:10,920 --> 00:00:12,570
-是利用所获得的知识
+是利用在另一项任务上使用大量数据训练的模型结果
 is to leverage the knowledge acquired
 
 5
 00:00:12,570 --> 00:00:15,543
-通过在另一项任务上使用大量数据训练的模型。
+来获取知识。
 by a model trained with lots of data on another task.
 
 6
@@ -30,12 +30,12 @@ The model A will be trained specifically for task A.
 
 7
 00:00:20,130 --> 00:00:22,200
-现在假设你想训练模型 B
+现在假设您想为了另一个任务
 Now let's say you want to train a model B
 
 8
 00:00:22,200 --> 00:00:23,970
-为了不同的任务。
+训练模型 B。
 for a different task.
 
 9
@@ -45,17 +45,17 @@ One option would be to train the model from scratch.
 
 10
 00:00:27,330 --> 00:00:30,633
-这可能需要大量的计算、时间和数据。
+但这可能需要大量的计算、时间和数据。
 This could take lots of computation, time and data.
 
 11
 00:00:31,470 --> 00:00:34,260
-相反，我们可以初始化模型 B
+我们可以有另一种选择，初始化模型 B
 Instead, we could initialize model B
 
 12
 00:00:34,260 --> 00:00:36,570
-与模型 A 具有相同的权重，
+它与模型 A 具有相同的权重，
 with the same weights as model A,
 
 13
@@ -75,37 +75,37 @@ all the model's weight are initialized randomly.
 
 16
 00:00:45,870 --> 00:00:48,870
-在这个例子中，我们正在训练一个 BERT 模型
+在这个例子中，我们正在基于识别任务上
 In this example, we are training a BERT model
 
 17
 00:00:48,870 --> 00:00:50,220
-在识别任务上
+训练一个 BERT 模型
 on the task of recognizing
 
 18
 00:00:50,220 --> 00:00:52,203
-两个句子是否相似。
+来判断两个句子是否相似。
 if two sentences are similar or not.
 
 19
 00:00:54,116 --> 00:00:56,730
-在左边，它是从头开始训练的，
+左边的例子是从头开始训练的，
 On the left, it's trained from scratch,
 
 20
 00:00:56,730 --> 00:01:00,000
-在右侧，它正在微调预训练模型。
+右边则代表正在微调预训练模型。
 and on the right it's fine-tuning a pretrained model.
 
 21
 00:01:00,000 --> 00:01:02,220
-正如我们所见，使用迁移学习
+正如我们所见，使用转移学习
 As we can see, using transfer learning
 
 22
 00:01:02,220 --> 00:01:05,160
-并且预训练模型产生了更好的结果。
+和预训练模型产生了更好的结果。
 and the pretrained model yields better results.
 
 23
@@ -120,7 +120,7 @@ The training from scratch is capped around 70% accuracy
 
 25
 00:01:10,620 --> 00:01:13,293
-而预训练模型轻松击败了 86%。
+而预训练模型轻松达到了 86%。
 while the pretrained model beats the 86% easily.
 
 26
@@ -130,17 +130,17 @@ This is because pretrained models
 
 27
 00:01:16,140 --> 00:01:18,420
-通常接受大量数据的训练
+通常基于大量数据进行训练
 are usually trained on large amounts of data
 
 28
 00:01:18,420 --> 00:01:21,000
-为模型提供统计理解
+这些数据为模型在预训练期间
 that provide the model with a statistical understanding
 
 29
 00:01:21,000 --> 00:01:23,413
-预训练期间使用的语言。
+提供了对语言使用的统计理解。
 of the language used during pretraining.
 
 30
@@ -150,7 +150,7 @@ In computer vision,
 
 31
 00:01:25,950 --> 00:01:28,080
-迁移学习已成功应用
+转移学习已成功应用
 transfer learning has been applied successfully
 
 32
@@ -165,7 +165,7 @@ Models are frequently pretrained on ImageNet,
 
 34
 00:01:32,850 --> 00:01:36,153
-包含 120 万张照片图像的数据集。
+它是一种包含 120 万张照片图像的数据集。
 a dataset containing 1.2 millions of photo images.
 
 35
@@ -190,12 +190,12 @@ In Natural Language Processing,
 
 39
 00:01:49,140 --> 00:01:51,870
-迁移学习是最近才出现的。
+转移学习是最近才出现的。
 transfer learning is a bit more recent.
 
 40
 00:01:51,870 --> 00:01:54,480
-与 ImageNet 的一个关键区别是预训练
+它与 ImageNet 的一个关键区别是预训练
 A key difference with ImageNet is that the pretraining
 
 41
@@ -205,12 +205,12 @@ is usually self-supervised,
 
 42
 00:01:56,460 --> 00:01:58,770
-这意味着它不需要人工注释
+这意味着它不需要人工对标签
 which means it doesn't require humans annotations
 
 43
 00:01:58,770 --> 00:01:59,673
-对于标签。
+进行注释。
 for the labels.
 
 44
@@ -225,7 +225,7 @@ is to guess the next word in a sentence.
 
 46
 00:02:05,310 --> 00:02:07,710
-这只需要大量的文本。
+它只需要大量的输入文本。
 Which only requires lots and lots of text.
 
 47
@@ -235,12 +235,12 @@ GPT-2 for instance, was pretrained this way
 
 48
 00:02:10,710 --> 00:02:12,900
-使用 4500 万个链接的内容
+它使用 4500 万个用户在 Reddit 上发布的
 using the content of 45 millions links
 
 49
 00:02:12,900 --> 00:02:14,673
-用户在 Reddit 上发布。
+链接的内容。
 posted by users on Reddit.
 
 50
@@ -260,42 +260,42 @@ Which is similar to fill-in-the-blank tests
 
 53
 00:02:24,540 --> 00:02:26,760
-你可能在学校做过。
+您可能在学校做过。
 you may have done in school.
 
 54
 00:02:26,760 --> 00:02:29,880
-BERT 是使用英文维基百科以这种方式进行预训练的
+BERT 是使用英文维基百科和 11,000 本未出版的书籍
 BERT was pretrained this way using the English Wikipedia
 
 55
 00:02:29,880 --> 00:02:31,893
-和 11,000 本未出版的书籍。
+进行预训练的。
 and 11,000 unpublished books.
 
 56
 00:02:33,120 --> 00:02:36,450
-在实践中，迁移学习应用于给定模型
+在实践中，转移学习是通过抛弃原模型的头部
 In practice, transfer learning is applied on a given model
 
 57
 00:02:36,450 --> 00:02:39,090
-通过扔掉它的头，也就是说，
+即其针对预训练目标的最后几层
 by throwing away its head, that is,
 
 58
 00:02:39,090 --> 00:02:42,150
-它的最后一层专注于预训练目标，
+并用一个新的、随机初始化的头部
 its last layers focused on the pretraining objective,
 
 59
 00:02:42,150 --> 00:02:45,360
-并用一个新的、随机初始化的头替换它
+来替换它来应用的
 and replacing it with a new, randomly initialized head
 
 60
 00:02:45,360 --> 00:02:46,860
-适合手头的任务。
+这个新的头部适用于当前的任务。
 suitable for the task at hand.
 
 61
@@ -320,37 +320,37 @@ Since our task had two labels.
 
 65
 00:02:59,700 --> 00:03:02,490
-为了尽可能高效，使用预训练模型
+为了尽可能高效
 To be as efficient as possible, the pretrained model used
 
 66
 00:03:02,490 --> 00:03:03,770
-应该尽可能相似
+所使用的预训练模型
 should be as similar as possible
 
 67
 00:03:03,770 --> 00:03:06,270
-对其进行微调的任务。
+应尽可能与其微调的任务相似。
 to the task it's fine-tuned on.
 
 68
 00:03:06,270 --> 00:03:08,190
-例如，如果问题
+例如，如果当前需要
 For instance, if the problem
 
 69
 00:03:08,190 --> 00:03:10,860
-是对德语句子进行分类，
+对德语句子进行分类，
 is to classify German sentences,
 
 70
 00:03:10,860 --> 00:03:13,053
-最好使用德国预训练模型。
+最好使用德语预训练模型。
 it's best to use a German pretrained model.
 
 71
 00:03:14,370 --> 00:03:16,649
-但好事也有坏事。
+但有好事也有坏事。
 But with the good comes the bad.
 
 72
@@ -360,87 +360,87 @@ The pretrained model does not only transfer its knowledge,
 
 73
 00:03:19,380 --> 00:03:21,693
-以及它可能包含的任何偏见。
+同时也转移了它可能包含的任何偏见。
 but also any bias it may contain.
 
 74
 00:03:22,530 --> 00:03:24,300
-ImageNet 主要包含图像
+ImageNet 主要包含来自美国和西欧
 ImageNet mostly contains images
 
 75
 00:03:24,300 --> 00:03:26,850
-来自美国和西欧。
+的图像。
 coming from the United States and Western Europe.
 
 76
 00:03:26,850 --> 00:03:28,020
-所以模型用它微调
+所以基于它进行微调的模型
 So models fine-tuned with it
 
 77
 00:03:28,020 --> 00:03:31,710
-通常会在来自这些国家 / 地区的图像上表现更好。
+通常会在来自这些国家或地区的图像上表现更好。
 usually will perform better on images from these countries.
 
 78
 00:03:31,710 --> 00:03:33,690
-OpenAI 还研究了偏差
+OpenAI 还研究了
 OpenAI also studied the bias
 
 79
 00:03:33,690 --> 00:03:36,120
-在其 GPT-3 模型的预测中
+其使用猜测下一个单词目标
 in the predictions of its GPT-3 model
 
 80
 00:03:36,120 --> 00:03:36,953
-这是预训练的
+预训练的 GPT-3 模型中
 which was pretrained
 
 81
 00:03:36,953 --> 00:03:38,750
-使用猜测下一个单词目标。
+预测的偏差。
 using the guess the next word objective.
 
 82
 00:03:39,720 --> 00:03:41,040
-更改提示的性别
+将提示的性别
 Changing the gender of the prompt
 
 83
 00:03:41,040 --> 00:03:44,250
-从他非常到她非常
+从“他”更改到“她”
 from he was very to she was very
 
 84
 00:03:44,250 --> 00:03:47,550
-改变了大多数中性形容词的预测
+会使预测从主要是中性形容词
 changed the predictions from mostly neutral adjectives
 
 85
 00:03:47,550 --> 00:03:49,233
-几乎只有物理的。
+变为几乎只有物理上的形容词。
 to almost only physical ones.
 
 86
 00:03:50,400 --> 00:03:52,367
-在他们的 GPT-2 模型的模型卡中，
+在他们的 GPT-2 的模型卡中，
 In their model card of the GPT-2 model,
 
 87
 00:03:52,367 --> 00:03:54,990
-OpenAI 也承认它的偏见
+OpenAI 也承认了它的偏见
 OpenAI also acknowledges its bias
 
 88
 00:03:54,990 --> 00:03:56,730
-并且不鼓励使用它
+并且不鼓励在与人类交互的系统中
 and discourages its use
 
 89
 00:03:56,730 --> 00:03:58,803
-在与人类交互的系统中。
+使用它。
 in systems that interact with humans.
 
 90