diff --git a/subtitles/zh-CN/64_using-a-custom-loss-function.srt b/subtitles/zh-CN/64_using-a-custom-loss-function.srt index efcbaf454..0b6beaeaf 100644 --- a/subtitles/zh-CN/64_using-a-custom-loss-function.srt +++ b/subtitles/zh-CN/64_using-a-custom-loss-function.srt @@ -15,12 +15,12 @@ 4 00:00:05,550 --> 00:00:07,500 -- 在本视频中,我们将介绍如何设置 +- 在本视频中,我们将介绍 - In this video, we take a look at setting up 5 00:00:07,500 --> 00:00:09,303 -用于训练的自定义损失函数。 +如何自定义用于训练的损失函数。 a custom loss function for training. 6 @@ -30,42 +30,42 @@ In the default loss function, all samples, 7 00:00:13,260 --> 00:00:15,840 -例如这些代码片段,都被同等对待 +例如这些代码片段,无论其内容如何 such as these code snippets, are treated the same 8 00:00:15,840 --> 00:00:18,960 -不管他们的内容如何,但有一些场景 +都被同等对待,但有一些场景下 irrespective of their content but there are scenarios 9 00:00:18,960 --> 00:00:21,660 -对样本进行不同加权可能有意义。 +对样本进行不同加权是合理的。 where it could make sense to weight the samples differently. 10 00:00:21,660 --> 00:00:24,570 -例如,如果一个样本包含很多标记 +例如,如果一个样本包含很多 If, for example, one sample contains a lot of tokens 11 00:00:24,570 --> 00:00:26,160 -我们感兴趣的 +我们所感兴趣的词元 that are of interest to us 12 00:00:26,160 --> 00:00:29,910 -或者样本是否具有有利的标记多样性。 +或者样本内包含理想的多样性词元 or if a sample has a favorable diversity of tokens. 13 00:00:29,910 --> 00:00:31,950 -我们还可以实施其他启发式 +我们还可以通过模式匹配或者其他规则 We can also implement other heuristics 14 00:00:31,950 --> 00:00:33,963 -与模式匹配或其他规则。 +实现其他启发式。 with pattern matching or other rules. 15 @@ -75,7 +75,7 @@ For each sample, we get a loss value during training 16 00:00:39,150 --> 00:00:41,850 -我们可以将损失与重量结合起来。 +我们可以将损失与权重结合起来。 and we can combine that loss with a weight. 17 @@ -110,12 +110,12 @@ that helps us autocomplete common data science code. 23 00:00:57,030 --> 00:01:01,830 -对于那个任务,我们想给样本赋予更强的权重 +对于那个任务,包含和数据科学栈相关的词元 For that task, we would like to weight samples stronger 24 00:01:01,830 --> 00:01:04,110 -其中与数据科学堆栈相关的令牌, +我们想给样本赋予更强的权重, where tokens related to the data science stack, 25 @@ -125,27 +125,27 @@ such as pd or np, occur more frequently. 26 00:01:10,140 --> 00:01:13,080 -在这里你看到一个损失函数正是这样做的 +在这里你看到一个损失函数是 Here you see a loss function that does exactly that 27 00:01:13,080 --> 00:01:15,180 -用于因果语言建模。 +针对因果语言建模这样做的。 for causal language modeling. 28 00:01:15,180 --> 00:01:18,030 -它采用模型的输入和预测的逻辑, +它采用模型的输入和预测的对数, It takes the model's input and predicted logits, 29 00:01:18,030 --> 00:01:20,343 -以及作为输入的密钥标记。 +以及作为输入的关键词元。 as well as the key tokens, as input. 30 00:01:21,869 --> 00:01:25,113 -首先,输入和逻辑对齐。 +首先,输入和对数是对齐的。 First, the inputs and logits are aligned. 31 @@ -155,7 +155,7 @@ Then the loss per sample is calculated, 32 00:01:29,310 --> 00:01:30,843 -其次是重量。 +其次是权重。 followed by the weights. 33 @@ -170,7 +170,7 @@ This is a pretty big function, so let's take a closer look 35 00:01:39,150 --> 00:01:40,953 -在损失和重量块。 +损失和权重块。 at the loss and the weight blocks. 36 @@ -180,37 +180,37 @@ During the calculation of the standard loss, 37 00:01:45,600 --> 00:01:48,930 -logits 和标签在批次上变平。 +对数和标签在整批数据上进行扁平化处理。 the logits and labels are flattened over the batch. 38 00:01:48,930 --> 00:01:52,590 -有了视图,我们展开张量得到矩阵 +有了视图,我们展开 tensor 得到矩阵 With the view, we unflatten the tensor to get the matrix 39 00:01:52,590 --> 00:01:55,320 -批次中的每个样本都有一行和一列 +其中的行代表整批数据中的每个样本, with a row for each sample in the batch and a column 40 00:01:55,320 --> 00:01:57,723 -对于样本序列中的每个位置。 +其中的列表示样本在序列中的位置。 for each position in the sequence of the sample. 41 00:01:58,920 --> 00:02:00,600 -我们不需要每个头寸的损失, +我们不需要每个位置上都计算损失, We don't need the loss per position, 42 00:02:00,600 --> 00:02:04,083 -所以我们对每个样本的所有头寸的损失进行平均。 +所以我们将每个样本在所有的位置的损失进行平均。 so we average the loss over all positions for each sample. 43 00:02:06,150 --> 00:02:08,970 -对于权重,我们使用布尔逻辑得到一个张量 +对于权重,我们使用 Boolean 逻辑得到一个 tensor For the weights, we use Boolean logic to get a tensor 44 @@ -220,17 +220,17 @@ with 1s where a keyword occurred and 0s where not. 45 00:02:13,440 --> 00:02:15,690 -这个张量有一个额外的维度 +这个 tensor 有一个额外的维度 This tensor has an additional dimension 46 00:02:15,690 --> 00:02:18,540 -作为我们刚刚看到的损失张量,因为我们得到 +作为我们刚刚看到的损失 tensor, as the loss tensor we just saw because we get 47 00:02:18,540 --> 00:02:21,693 -单独矩阵中每个关键字的信息。 +因为我们可以获得单独矩阵中的每个关键词的信息。 the information for each keyword in a separate matrix. 48 @@ -250,17 +250,17 @@ so we can sum overall keywords and all positions per sample. 51 00:02:33,450 --> 00:02:35,010 -现在我们快到了。 +现在我们就快要完成了。 Now we're almost there. 52 00:02:35,010 --> 00:02:38,850 -我们只需要将损失与每个样本的权重结合起来。 +我们只需要将每个样本的损失连同权重结合起来。 We only need to combine the loss with the weight per sample. 53 00:02:38,850 --> 00:02:41,790 -我们用元素明智的乘法来做到这一点 +我们通过元素积运算来做到这一点 We do this with element wise multiplication 54 @@ -270,32 +270,32 @@ and then average overall samples in the batch. 55 00:02:45,233 --> 00:02:46,066 -到底, +最后, In the end, 56 00:02:46,066 --> 00:02:49,110 -我们对整批只有一个损失值 +整批数据只有一个损失值 we have exactly one loss value for the whole batch 57 00:02:49,110 --> 00:02:51,330 -这是整个必要的逻辑 +这是创建自定义加权损失 and this is the whole necessary logic 58 00:02:51,330 --> 00:02:53,223 -创建自定义加权损失。 +整个必要的逻辑。 to create a custom weighted loss. 59 00:02:56,250 --> 00:02:59,010 -让我们看看如何利用自定义损失 +让我们看看如何结合 Accelerate 和 Trainer 一起 Let's see how we can make use of that custom loss 60 00:02:59,010 --> 00:03:00,753 -与 Accelerate 和 Trainer 一起。 +利用自定义损失。 with Accelerate and the Trainer. 61 @@ -305,7 +305,7 @@ In Accelerate, we just pass the input_ids 62 00:03:04,656 --> 00:03:05,730 -到模型以获得 logits +到模型以获得对数值 to the model to get the logits 63 @@ -340,17 +340,17 @@ We just need to make sure that we return 69 00:03:20,970 --> 00:03:24,450 -损失和模型以相同的格式输出。 +损失和模型输出的格式相同。 the loss and the model outputs in the same format. 70 00:03:24,450 --> 00:03:27,570 -这样,你就可以集成自己的出色损失函数 +这样,你就可以结合 Trainer 和 Accelerate With that, you can integrate your own awesome loss function 71 00:03:27,570 --> 00:03:29,763 -与培训师和加速。 +集成自己的出色损失函数。 with both the Trainer and Accelerate. 72