Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
86 changes: 43 additions & 43 deletions subtitles/zh-CN/64_using-a-custom-loss-function.srt
Original file line number Diff line number Diff line change
Expand Up @@ -15,12 +15,12 @@

4
00:00:05,550 --> 00:00:07,500
- 在本视频中,我们将介绍如何设置
- 在本视频中,我们将介绍
- In this video, we take a look at setting up

5
00:00:07,500 --> 00:00:09,303
用于训练的自定义损失函数
如何自定义用于训练的损失函数
a custom loss function for training.

6
Expand All @@ -30,42 +30,42 @@ In the default loss function, all samples,

7
00:00:13,260 --> 00:00:15,840
例如这些代码片段,都被同等对待
例如这些代码片段,无论其内容如何
such as these code snippets, are treated the same

8
00:00:15,840 --> 00:00:18,960
不管他们的内容如何,但有一些场景
都被同等对待,但有一些场景下
irrespective of their content but there are scenarios

9
00:00:18,960 --> 00:00:21,660
对样本进行不同加权可能有意义
对样本进行不同加权是合理的
where it could make sense to weight the samples differently.

10
00:00:21,660 --> 00:00:24,570
例如,如果一个样本包含很多标记
例如,如果一个样本包含很多
If, for example, one sample contains a lot of tokens

11
00:00:24,570 --> 00:00:26,160
我们感兴趣的
我们所感兴趣的词元
that are of interest to us

12
00:00:26,160 --> 00:00:29,910
或者样本是否具有有利的标记多样性。
或者样本内包含理想的多样性词元
or if a sample has a favorable diversity of tokens.

13
00:00:29,910 --> 00:00:31,950
我们还可以实施其他启发式
我们还可以通过模式匹配或者其他规则
We can also implement other heuristics

14
00:00:31,950 --> 00:00:33,963
与模式匹配或其他规则
实现其他启发式
with pattern matching or other rules.

15
Expand All @@ -75,7 +75,7 @@ For each sample, we get a loss value during training

16
00:00:39,150 --> 00:00:41,850
我们可以将损失与重量结合起来
我们可以将损失与权重结合起来
and we can combine that loss with a weight.

17
Expand Down Expand Up @@ -110,12 +110,12 @@ that helps us autocomplete common data science code.

23
00:00:57,030 --> 00:01:01,830
对于那个任务,我们想给样本赋予更强的权重
对于那个任务,包含和数据科学栈相关的词元
For that task, we would like to weight samples stronger

24
00:01:01,830 --> 00:01:04,110
其中与数据科学堆栈相关的令牌
我们想给样本赋予更强的权重
where tokens related to the data science stack,

25
Expand All @@ -125,27 +125,27 @@ such as pd or np, occur more frequently.

26
00:01:10,140 --> 00:01:13,080
在这里你看到一个损失函数正是这样做的
在这里你看到一个损失函数是
Here you see a loss function that does exactly that

27
00:01:13,080 --> 00:01:15,180
用于因果语言建模
针对因果语言建模这样做的
for causal language modeling.

28
00:01:15,180 --> 00:01:18,030
它采用模型的输入和预测的逻辑
它采用模型的输入和预测的对数
It takes the model's input and predicted logits,

29
00:01:18,030 --> 00:01:20,343
以及作为输入的密钥标记
以及作为输入的关键词元
as well as the key tokens, as input.

30
00:01:21,869 --> 00:01:25,113
首先,输入和逻辑对齐
首先,输入和对数是对齐的
First, the inputs and logits are aligned.

31
Expand All @@ -155,7 +155,7 @@ Then the loss per sample is calculated,

32
00:01:29,310 --> 00:01:30,843
其次是重量
其次是权重
followed by the weights.

33
Expand All @@ -170,7 +170,7 @@ This is a pretty big function, so let's take a closer look

35
00:01:39,150 --> 00:01:40,953
在损失和重量块
损失和权重块
at the loss and the weight blocks.

36
Expand All @@ -180,37 +180,37 @@ During the calculation of the standard loss,

37
00:01:45,600 --> 00:01:48,930
logits 和标签在批次上变平
对数和标签在整批数据上进行扁平化处理
the logits and labels are flattened over the batch.

38
00:01:48,930 --> 00:01:52,590
有了视图,我们展开张量得到矩阵
有了视图,我们展开 tensor 得到矩阵
With the view, we unflatten the tensor to get the matrix

39
00:01:52,590 --> 00:01:55,320
批次中的每个样本都有一行和一列
其中的行代表整批数据中的每个样本,
with a row for each sample in the batch and a column

40
00:01:55,320 --> 00:01:57,723
对于样本序列中的每个位置
其中的列表示样本在序列中的位置
for each position in the sequence of the sample.

41
00:01:58,920 --> 00:02:00,600
我们不需要每个头寸的损失
我们不需要每个位置上都计算损失
We don't need the loss per position,

42
00:02:00,600 --> 00:02:04,083
所以我们对每个样本的所有头寸的损失进行平均
所以我们将每个样本在所有的位置的损失进行平均
so we average the loss over all positions for each sample.

43
00:02:06,150 --> 00:02:08,970
对于权重,我们使用布尔逻辑得到一个张量
对于权重,我们使用 Boolean 逻辑得到一个 tensor
For the weights, we use Boolean logic to get a tensor

44
Expand All @@ -220,17 +220,17 @@ with 1s where a keyword occurred and 0s where not.

45
00:02:13,440 --> 00:02:15,690
这个张量有一个额外的维度
这个 tensor 有一个额外的维度
This tensor has an additional dimension

46
00:02:15,690 --> 00:02:18,540
作为我们刚刚看到的损失张量,因为我们得到
作为我们刚刚看到的损失 tensor,
as the loss tensor we just saw because we get

47
00:02:18,540 --> 00:02:21,693
单独矩阵中每个关键字的信息
因为我们可以获得单独矩阵中的每个关键词的信息
the information for each keyword in a separate matrix.

48
Expand All @@ -250,17 +250,17 @@ so we can sum overall keywords and all positions per sample.

51
00:02:33,450 --> 00:02:35,010
现在我们快到了
现在我们就快要完成了
Now we're almost there.

52
00:02:35,010 --> 00:02:38,850
我们只需要将损失与每个样本的权重结合起来
我们只需要将每个样本的损失连同权重结合起来
We only need to combine the loss with the weight per sample.

53
00:02:38,850 --> 00:02:41,790
我们用元素明智的乘法来做到这一点
我们通过元素积运算来做到这一点
We do this with element wise multiplication

54
Expand All @@ -270,32 +270,32 @@ and then average overall samples in the batch.

55
00:02:45,233 --> 00:02:46,066
到底
最后
In the end,

56
00:02:46,066 --> 00:02:49,110
我们对整批只有一个损失值
整批数据只有一个损失值
we have exactly one loss value for the whole batch

57
00:02:49,110 --> 00:02:51,330
这是整个必要的逻辑
这是创建自定义加权损失
and this is the whole necessary logic

58
00:02:51,330 --> 00:02:53,223
创建自定义加权损失
整个必要的逻辑
to create a custom weighted loss.

59
00:02:56,250 --> 00:02:59,010
让我们看看如何利用自定义损失
让我们看看如何结合 Accelerate 和 Trainer 一起
Let's see how we can make use of that custom loss

60
00:02:59,010 --> 00:03:00,753
与 Accelerate 和 Trainer 一起
利用自定义损失
with Accelerate and the Trainer.

61
Expand All @@ -305,7 +305,7 @@ In Accelerate, we just pass the input_ids

62
00:03:04,656 --> 00:03:05,730
到模型以获得 logits
到模型以获得对数值
to the model to get the logits

63
Expand Down Expand Up @@ -340,17 +340,17 @@ We just need to make sure that we return

69
00:03:20,970 --> 00:03:24,450
损失和模型以相同的格式输出
损失和模型输出的格式相同
the loss and the model outputs in the same format.

70
00:03:24,450 --> 00:03:27,570
这样,你就可以集成自己的出色损失函数
这样,你就可以结合 Trainer 和 Accelerate
With that, you can integrate your own awesome loss function

71
00:03:27,570 --> 00:03:29,763
与培训师和加速
集成自己的出色损失函数
with both the Trainer and Accelerate.

72
Expand Down