-
Notifications
You must be signed in to change notification settings - Fork 4.6k
不同训练集的跨语种能力(Cross‐Language Ability of Different Training Sets)
一、跨语种定义
I. Cross-Language Definition
1、参考音频、参考文本=语种A,待合成文本=语种B,A!=B
Reference audio, reference text = Language A; text to be synthesized = Language B, where A != B.
或者
or
2、训练集=语种A,待合成文本=语种B,A!=B
Training set = Language A; text to be synthesized = Language B, where A != B.
暂不考虑参考音频选取训练集外的音色的情况,因此1、2暂时认为等价
Currently, we do not consider the selection of reference audio with voices outside the training set, so 1 and 2 are considered equivalent for now.
二、底模携带了5种语言,因此可以5种语言互相跨语种,但是微调呢?
The base model supports five languages, so can the model cross languages when it's fine-tuned?
1、微调训练集比较小,例如1~30分钟:任意训练A语种,则5种语言的文本均可推理,因为底模拥有跨语种能力
If the fine-tuning training set is relatively small (e.g., 1-30 minutes): Training with any Language A allows inference across the texts of all five languages because the base model possesses cross-language capabilities.
2、微调训练集比较大,那么底模的跨语种能力被微调训练集洗掉了,举例:
If the fine-tuning training set is relatively large, then the cross-language capabilities of the base model may be overwritten by the fine-tuning set. For example:
(1)假如训练集包含ABC三个语种,那么该音色(及其对应的参考音频)可以在ABC中跨语种,ABC外的DE的跨语种能力丧失
If the training set includes languages A, B, and C, then the voice (and its corresponding reference audio) can cross languages among A, B, and C, but the cross-language ability outside of these (D and E) is lost.
(2)假如训练集仅包含A语种,那么该模型不具备跨语种能力
If the training set includes only Language A, then the model will not have cross-language capabilities.