Skip to content

不同训练集的跨语种能力(Cross‐Language Ability of Different Training Sets)

RVC-Boss edited this page Oct 23, 2024 · 1 revision

一、跨语种定义

I. Cross-Language Definition

1、参考音频、参考文本=语种A,待合成文本=语种B,A!=B

Reference audio, reference text = Language A; text to be synthesized = Language B, where A != B.

或者

or

2、训练集=语种A,待合成文本=语种B,A!=B

Training set = Language A; text to be synthesized = Language B, where A != B.

暂不考虑参考音频选取训练集外的音色的情况,因此1、2暂时认为等价

Currently, we do not consider the selection of reference audio with voices outside the training set, so 1 and 2 are considered equivalent for now.

二、底模携带了5种语言,因此可以5种语言互相跨语种,但是微调呢?

The base model supports five languages, so can the model cross languages when it's fine-tuned?

1、微调训练集比较小,例如1~30分钟:任意训练A语种,则5种语言的文本均可推理,因为底模拥有跨语种能力

If the fine-tuning training set is relatively small (e.g., 1-30 minutes): Training with any Language A allows inference across the texts of all five languages because the base model possesses cross-language capabilities.

2、微调训练集比较大,那么底模的跨语种能力被微调训练集洗掉了,举例:

If the fine-tuning training set is relatively large, then the cross-language capabilities of the base model may be overwritten by the fine-tuning set. For example:

(1)假如训练集包含ABC三个语种,那么该音色(及其对应的参考音频)可以在ABC中跨语种,ABC外的DE的跨语种能力丧失

If the training set includes languages A, B, and C, then the voice (and its corresponding reference audio) can cross languages among A, B, and C, but the cross-language ability outside of these (D and E) is lost.

(2)假如训练集仅包含A语种,那么该模型不具备跨语种能力

If the training set includes only Language A, then the model will not have cross-language capabilities.