Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Phonemize Japanese Language #146

Open
MikuAuahDark opened this issue Feb 22, 2023 · 2 comments
Open

Phonemize Japanese Language #146

MikuAuahDark opened this issue Feb 22, 2023 · 2 comments

Comments

@MikuAuahDark
Copy link

Describe the bug
Phonemizer can't be used to phonemize Japanese characters.

Phonemizer version

phonemizer-3.2.1
available backends: espeak-ng-1.52, segments-2.2.1
uninstalled backends: espeak-mbrola, festival

System
Windows 11 22H2 patch 1265

To reproduce

import phonemizer
print(phonemizer.phonemize("ほたる", language="ja", backend="espeak"))

Could not load the mbrola.dll file. is printed on the console followed by RuntimeError: failed to load voice "ja"

Expected behavior
Runs without problems.

Additional context
Running espeak-ng from command-line directly works.

C:\Users\MikuAuahDark>espeak-ng -q -x --ipa -v ja "ほたる"
ho̞tˈäɽɯᵝ
@MikuAuahDark
Copy link
Author

Additional context
This LuaJIT script also properly able to return values same as the espeak-ng command-line, so phonemizer probably did something fancy regarding initialization?

local ffi = require("ffi")
local espeak = ffi.load("C:/Program Files/eSpeak NG/libespeak-ng.dll")

ffi.cdef[[
int espeak_Initialize(int output, int buflength, const char *path, int options);
const char *espeak_TextToPhonemes(const void **textptr, int textmode, int phonememode);
int espeak_SetVoiceByName(const char *name);
]]

local text = "ほたる"
-- 3 = allow espeakEVENT_PHONEME events AND espeakEVENT_PHONEME events give IPA phoneme names
print("espeak_Initialize", espeak.espeak_Initialize(2, 0, nil, 3))
print("espeak_SetVoiceByName", espeak.espeak_SetVoiceByName("ja"))

local temp = ffi.new("const char*[1]")
temp[0] = text
while temp[0] ~= nil do
	-- 1 = UTF-8 mode, 2 = bit 1 = IPA phonetic
	local result = espeak.espeak_TextToPhonemes(ffi.cast("const void**", temp), 1, 2)
	if result == nil then
		print("espeak_TextToPhonemes failed")
	end
	print("espeak_TextToPhonemes", ffi.string(result))
end

@MikuAuahDark
Copy link
Author

MikuAuahDark commented Feb 22, 2023

Alright, found the issue.

I have to add this

if voice.identifier.startswith('mb'):
    continue

Before inserting it to the list of available languages:

for voice in self.available_voices():
if voice.language not in available:
available[voice.language] = voice.identifier

In my eSpeak installation, MBROLA voices are listed first. Using LuaJIT, I was able to print list of the voices in order they're listed by espeak_ListVoices:

...
mb\mb-it2       italian-mbrola-2        it
mb\mb-jp1       japanese-mbrola-1       ja
mb\mb-jp2       japanese-mbrola-2       ja
mb\mb-jp3       japanese-mbrola-3       ja
jpx\ja  Japanese        ja
art\jbo Lojban  jbo
...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant