不同训练集的跨语种能力(Cross‐Language Ability of Different Training Sets)

一、跨语种定义

I. Cross-Language Definition

1、参考音频、参考文本=语种A，待合成文本=语种B，A!=B

Reference audio, reference text = Language A; text to be synthesized = Language B, where A != B.

或者

or

2、训练集=语种A，待合成文本=语种B，A!=B

Training set = Language A; text to be synthesized = Language B, where A != B.

暂不考虑参考音频选取训练集外的音色的情况，因此1、2暂时认为等价

Currently, we do not consider the selection of reference audio with voices outside the training set, so 1 and 2 are considered equivalent for now.

二、底模携带了5种语言，因此可以5种语言互相跨语种，但是微调呢？

The base model supports five languages, so can the model cross languages when it's fine-tuned?

1、微调训练集比较小，例如1~30分钟：任意训练A语种，则5种语言的文本均可推理，因为底模拥有跨语种能力

If the fine-tuning training set is relatively small (e.g., 1-30 minutes): Training with any Language A allows inference across the texts of all five languages because the base model possesses cross-language capabilities.

2、微调训练集比较大，那么底模的跨语种能力被微调训练集洗掉了,举例：

If the fine-tuning training set is relatively large, then the cross-language capabilities of the base model may be overwritten by the fine-tuning set. For example:

(1)假如训练集包含ABC三个语种，那么该音色（及其对应的参考音频）可以在ABC中跨语种，ABC外的DE的跨语种能力丧失

If the training set includes languages A, B, and C, then the voice (and its corresponding reference audio) can cross languages among A, B, and C, but the cross-language ability outside of these (D and E) is lost.

(2)假如训练集仅包含A语种，那么该模型不具备跨语种能力

If the training set includes only Language A, then the model will not have cross-language capabilities.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

不同训练集的跨语种能力(Cross‐Language Ability of Different Training Sets)

Clone this wiki locally