-
Notifications
You must be signed in to change notification settings - Fork 309
Developing new phonemizers
- Set up IDE (Recommended: Visual Studio or VS Code)
- Compile OpenUtau from source
- Begin learning C# from official Microsoft tutorials
- Read Phonemizer API
- For languages with multi-syllable words, read SyllableBasedPhonemizer API
Heavily commented example implementations, from simplest to most complex:
The main method to implement is:
public abstract Phoneme[] Process(Note[] notes, Note? prevNeighbour, Note? nextNeighbour);
-
notes
: A group of notes. The first note contains the lyric. The rest are extender notes whose lyric starts with+
. -
prevNeighbour
andnextNeighbour
: Useful info for creating diphones, if applicable. E.g., creating proper leading diphone in VCV. -
returns
: An array of phonemes, positioned relative to the first note.
Tips:
- To load singer specific resouce, Implement resouce loading in SetSinger() and use singer.Location to look for files.
- If uses expensive resource, load it lazily when the phonemizer is created the first time. Use your best adjudgement to decide its lifetime.
The API is implemented in OpenUtau.Core/Api/Phonemizer.cs
Usually a phonemizer's name is <language> <type>
for classic phonemizers, or <renderer> <language>
for machine-learning phonemizers.
- Renderer is "vogen", "nnsvs" or "diffsinger".
- Language is the spoken language that the phonemizer sings in, such as "English", "Japanese".
- Type is the vb type supported by the phonemizer, such as "CVVC", "VCV".
A phonemizer's tag is the abbreviation of the phonemizer's name. For example, the tag of "English Arpasing Phonemizer" is "EN ARPA"
- The language should be abbreviated in programmer's style, such as
EN
andJA
(as inEN-US
andJA-JP
). See microsoft official documentation for the language code of each language.
A complete Phonemizer should:
- Produce phonemes from the lyric, and previous / next notes if exist.
- Distribute phonemes to positions relative to the first note of each group of notes.
- (For Classic phonemizers) support multi-pitch and multi-color voicebanks.
Considering the characteristics of different languages, the phonemizer doesn't necessarily have to implement all the following features. However, implementing these features can maintain a consistent user experience across various phonemizers.
These features can be quickly implemented by inheriting a phonemizer template, such as SyllableBasedPhonemizer.
polysyllabic word support
For polysyllabic languages such as English, it should be supported to input lyrics on the first syllable, use +~
or +*
in the following notes to extend the current syllable, and use +
to distribute the next syllable.
Phonetic hint
Users can manually enter space-separated phoneme sequences (aka. Phonetic hint) in square brackets, such as read
, read[r iy d]
and [r iy d]
. When both phonetic hint and word exist, the phonetic hint takes precedence.
G2p
G2p (Grapheme to phoneme) can convert lyrics in natural languages to phoneme sequences. OpenUTAU has built-in G2ps for multiple languages, implemented using a machine learning model, which can cover most of the words in the language, and can predict the pronunciation of new words that have not been seen before. Using a unified G2p can make the pronunciation of the same lyrics consistent on different phonemizers.
For languages with a large number of words, and words cannot be converted into phoneme sequences through simple logic, such as English, French, and Russian, please use OpenUTAU's built-in G2p.
The following G2ps are included in OpenUTAU:
- English: ArpabetG2P
- French: FrenchG2p
- German: GermanG2p
- Italian: ItalianG2p
- Portuguese: PortugueseG2p
- Russian: RussianG2p
- Spanish: SpanishG2p
Custom pronunciation dictionary
On the basis of G2p, considering that some voicebanks have custom phonemes, a phonemizer should support custom dictionaries. Custom dictionaries can be loaded using G2pDictionary.
No matter which encoding the voicebank uses, yaml dictionaries should always use UTF-8 encoding.
Here is an example of custom dictionary:
%YAML 1.2
---
symbols:
- {symbol: aa, type: vowel}
- {symbol: ae, type: vowel}
- {symbol: ah, type: vowel}
- {symbol: ao, type: vowel}
- {symbol: aw, type: vowel}
- {symbol: ay, type: vowel}
- {symbol: b, type: stop}
- {symbol: ch, type: affricate}
- {symbol: d, type: stop}
- {symbol: dh, type: fricative}
- {symbol: eh, type: vowel}
- {symbol: er, type: vowel}
- {symbol: ey, type: vowel}
- {symbol: f, type: fricative}
- {symbol: g, type: stop}
- {symbol: hh, type: aspirate}
- {symbol: ih, type: vowel}
- {symbol: iy, type: vowel}
- {symbol: jh, type: affricate}
- {symbol: k, type: stop}
- {symbol: l, type: liquid}
- {symbol: m, type: nasal}
- {symbol: n, type: nasal}
- {symbol: ng, type: nasal}
- {symbol: ow, type: vowel}
- {symbol: oy, type: vowel}
- {symbol: p, type: stop}
- {symbol: r, type: liquid}
- {symbol: s, type: fricative}
- {symbol: sh, type: fricative}
- {symbol: t, type: stop}
- {symbol: th, type: fricative}
- {symbol: uh, type: vowel}
- {symbol: uw, type: vowel}
- {symbol: v, type: fricative}
- {symbol: w, type: semivowel}
- {symbol: y, type: semivowel}
- {symbol: z, type: fricative}
- {symbol: zh, type: fricative}
entries:
- grapheme: openutau
phonemes: [ow, p, eh, n, w, uw, t, ah, w, uw]
When reading and writing yaml files, OpenUtau always uses yaml 1.2 syntax. If you're developing third-party tools to parse or generate these yaml files, please use ruamel.yaml
instead of pyyaml
library.
- vowels:
aa, ae, ah, ao, aw, ay, eh, er, ey, ih, iy, ow, oy, uh, uw
- consonants:
b, ch, d, dh, f, g, hh, jh, k, l, m, n, ng, p, r, s, sh, t, th, v, w, y, z, zh
- vowels:
aa, ae, ah, ao, aw, ax, ay, ee, eh, er, ex, ih, iy, oe, ohh, ooh, oy, ue, uh, uw, yy
- consonants:
b, cc, ch, d, dh, f, g, hh, jh, k, l, m, n, ng, p, pf, q, r, rr, s, sh, t, th, ts, v, w, x, y, z, zh
- vowels:
a, a1, e, e1, EE, i, i1, o, o1, OO, u, u1
- consonants:
b, d, dz, dZZ, f, g, JJ, k, l, LL, m, n, nf, ng, p, r, rr, s, SS, t, ts, tSS, v, w, y, z
- vowels:
a, a~, e, e~, E, i, i~, o, o~, O, u, u~
- consonants:
b, d, dZ, f, g, j, j~, J, k, l, L, m, n, p, r, R, s, S, t, tS, v, w, w~, X, z, Z
- vowels:
a, aa, ay, ee, i, ii, ja, je, jo, ju, oo, u, uj, uu, y, yy
- consonants:
b, bb, c, ch, d, dd, f, ff, g, gg, h, hh, j, k, kk, l, ll, m, mm, n, nn, p, pp, r, rr, s, sch, sh, ss, t, tt, v, vv, z, zh, zz
- vowels:
a, e, i, o, u
- consonants:
b, B, ch, d, D, f, g, G, gn, I, k, l, ll, m, n, p, r, rr, s, t, U, w, x, y, Y, z