diff --git a/subtitles/zh-CN/64_using-a-custom-loss-function.srt b/subtitles/zh-CN/64_using-a-custom-loss-function.srt
index efcbaf454..0b6beaeaf 100644
--- a/subtitles/zh-CN/64_using-a-custom-loss-function.srt
+++ b/subtitles/zh-CN/64_using-a-custom-loss-function.srt
@@ -15,12 +15,12 @@
 
 4
 00:00:05,550 --> 00:00:07,500
-- 在本视频中，我们将介绍如何设置
+- 在本视频中，我们将介绍
 - In this video, we take a look at setting up
 
 5
 00:00:07,500 --> 00:00:09,303
-用于训练的自定义损失函数。
+如何自定义用于训练的损失函数。
 a custom loss function for training.
 
 6
@@ -30,42 +30,42 @@ In the default loss function, all samples,
 
 7
 00:00:13,260 --> 00:00:15,840
-例如这些代码片段，都被同等对待
+例如这些代码片段，无论其内容如何
 such as these code snippets, are treated the same
 
 8
 00:00:15,840 --> 00:00:18,960
-不管他们的内容如何，但有一些场景
+都被同等对待，但有一些场景下
 irrespective of their content but there are scenarios
 
 9
 00:00:18,960 --> 00:00:21,660
-对样本进行不同加权可能有意义。
+对样本进行不同加权是合理的。
 where it could make sense to weight the samples differently.
 
 10
 00:00:21,660 --> 00:00:24,570
-例如，如果一个样本包含很多标记
+例如，如果一个样本包含很多
 If, for example, one sample contains a lot of tokens
 
 11
 00:00:24,570 --> 00:00:26,160
-我们感兴趣的
+我们所感兴趣的词元
 that are of interest to us
 
 12
 00:00:26,160 --> 00:00:29,910
-或者样本是否具有有利的标记多样性。
+或者样本内包含理想的多样性词元
 or if a sample has a favorable diversity of tokens.
 
 13
 00:00:29,910 --> 00:00:31,950
-我们还可以实施其他启发式
+我们还可以通过模式匹配或者其他规则
 We can also implement other heuristics
 
 14
 00:00:31,950 --> 00:00:33,963
-与模式匹配或其他规则。
+实现其他启发式。
 with pattern matching or other rules.
 
 15
@@ -75,7 +75,7 @@ For each sample, we get a loss value during training
 
 16
 00:00:39,150 --> 00:00:41,850
-我们可以将损失与重量结合起来。
+我们可以将损失与权重结合起来。
 and we can combine that loss with a weight.
 
 17
@@ -110,12 +110,12 @@ that helps us autocomplete common data science code.
 
 23
 00:00:57,030 --> 00:01:01,830
-对于那个任务，我们想给样本赋予更强的权重
+对于那个任务，包含和数据科学栈相关的词元
 For that task, we would like to weight samples stronger
 
 24
 00:01:01,830 --> 00:01:04,110
-其中与数据科学堆栈相关的令牌，
+我们想给样本赋予更强的权重，
 where tokens related to the data science stack,
 
 25
@@ -125,27 +125,27 @@ such as pd or np, occur more frequently.
 
 26
 00:01:10,140 --> 00:01:13,080
-在这里你看到一个损失函数正是这样做的
+在这里你看到一个损失函数是
 Here you see a loss function that does exactly that
 
 27
 00:01:13,080 --> 00:01:15,180
-用于因果语言建模。
+针对因果语言建模这样做的。
 for causal language modeling.
 
 28
 00:01:15,180 --> 00:01:18,030
-它采用模型的输入和预测的逻辑，
+它采用模型的输入和预测的对数，
 It takes the model's input and predicted logits,
 
 29
 00:01:18,030 --> 00:01:20,343
-以及作为输入的密钥标记。
+以及作为输入的关键词元。
 as well as the key tokens, as input.
 
 30
 00:01:21,869 --> 00:01:25,113
-首先，输入和逻辑对齐。
+首先，输入和对数是对齐的。
 First, the inputs and logits are aligned.
 
 31
@@ -155,7 +155,7 @@ Then the loss per sample is calculated,
 
 32
 00:01:29,310 --> 00:01:30,843
-其次是重量。
+其次是权重。
 followed by the weights.
 
 33
@@ -170,7 +170,7 @@ This is a pretty big function, so let's take a closer look
 
 35
 00:01:39,150 --> 00:01:40,953
-在损失和重量块。
+损失和权重块。
 at the loss and the weight blocks.
 
 36
@@ -180,37 +180,37 @@ During the calculation of the standard loss,
 
 37
 00:01:45,600 --> 00:01:48,930
-logits 和标签在批次上变平。
+对数和标签在整批数据上进行扁平化处理。
 the logits and labels are flattened over the batch.
 
 38
 00:01:48,930 --> 00:01:52,590
-有了视图，我们展开张量得到矩阵
+有了视图，我们展开 tensor 得到矩阵
 With the view, we unflatten the tensor to get the matrix
 
 39
 00:01:52,590 --> 00:01:55,320
-批次中的每个样本都有一行和一列
+其中的行代表整批数据中的每个样本，
 with a row for each sample in the batch and a column
 
 40
 00:01:55,320 --> 00:01:57,723
-对于样本序列中的每个位置。
+其中的列表示样本在序列中的位置。
 for each position in the sequence of the sample.
 
 41
 00:01:58,920 --> 00:02:00,600
-我们不需要每个头寸的损失，
+我们不需要每个位置上都计算损失，
 We don't need the loss per position,
 
 42
 00:02:00,600 --> 00:02:04,083
-所以我们对每个样本的所有头寸的损失进行平均。
+所以我们将每个样本在所有的位置的损失进行平均。
 so we average the loss over all positions for each sample.
 
 43
 00:02:06,150 --> 00:02:08,970
-对于权重，我们使用布尔逻辑得到一个张量
+对于权重，我们使用 Boolean 逻辑得到一个 tensor
 For the weights, we use Boolean logic to get a tensor
 
 44
@@ -220,17 +220,17 @@ with 1s where a keyword occurred and 0s where not.
 
 45
 00:02:13,440 --> 00:02:15,690
-这个张量有一个额外的维度
+这个 tensor 有一个额外的维度
 This tensor has an additional dimension
 
 46
 00:02:15,690 --> 00:02:18,540
-作为我们刚刚看到的损失张量，因为我们得到
+作为我们刚刚看到的损失 tensor，
 as the loss tensor we just saw because we get
 
 47
 00:02:18,540 --> 00:02:21,693
-单独矩阵中每个关键字的信息。
+因为我们可以获得单独矩阵中的每个关键词的信息。
 the information for each keyword in a separate matrix.
 
 48
@@ -250,17 +250,17 @@ so we can sum overall keywords and all positions per sample.
 
 51
 00:02:33,450 --> 00:02:35,010
-现在我们快到了。
+现在我们就快要完成了。
 Now we're almost there.
 
 52
 00:02:35,010 --> 00:02:38,850
-我们只需要将损失与每个样本的权重结合起来。
+我们只需要将每个样本的损失连同权重结合起来。
 We only need to combine the loss with the weight per sample.
 
 53
 00:02:38,850 --> 00:02:41,790
-我们用元素明智的乘法来做到这一点
+我们通过元素积运算来做到这一点
 We do this with element wise multiplication
 
 54
@@ -270,32 +270,32 @@ and then average overall samples in the batch.
 
 55
 00:02:45,233 --> 00:02:46,066
-到底，
+最后，
 In the end,
 
 56
 00:02:46,066 --> 00:02:49,110
-我们对整批只有一个损失值
+整批数据只有一个损失值
 we have exactly one loss value for the whole batch
 
 57
 00:02:49,110 --> 00:02:51,330
-这是整个必要的逻辑
+这是创建自定义加权损失
 and this is the whole necessary logic
 
 58
 00:02:51,330 --> 00:02:53,223
-创建自定义加权损失。
+整个必要的逻辑。
 to create a custom weighted loss.
 
 59
 00:02:56,250 --> 00:02:59,010
-让我们看看如何利用自定义损失
+让我们看看如何结合 Accelerate 和 Trainer 一起
 Let's see how we can make use of that custom loss
 
 60
 00:02:59,010 --> 00:03:00,753
-与 Accelerate 和 Trainer 一起。
+利用自定义损失。
 with Accelerate and the Trainer.
 
 61
@@ -305,7 +305,7 @@ In Accelerate, we just pass the input_ids
 
 62
 00:03:04,656 --> 00:03:05,730
-到模型以获得 logits
+到模型以获得对数值
 to the model to get the logits
 
 63
@@ -340,17 +340,17 @@ We just need to make sure that we return
 
 69
 00:03:20,970 --> 00:03:24,450
-损失和模型以相同的格式输出。
+损失和模型输出的格式相同。
 the loss and the model outputs in the same format.
 
 70
 00:03:24,450 --> 00:03:27,570
-这样，你就可以集成自己的出色损失函数
+这样，你就可以结合 Trainer 和 Accelerate
 With that, you can integrate your own awesome loss function
 
 71
 00:03:27,570 --> 00:03:29,763
-与培训师和加速。
+集成自己的出色损失函数。
 with both the Trainer and Accelerate.
 
 72