-
Notifications
You must be signed in to change notification settings - Fork 105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] First class Lingua Libre support #263
Comments
Their data is usable already somehow but it is not usable out-of-box. User can just download their dataset https://lingualibre.org/datasets/ and put it under the sound folder. |
@lingua-libre French pronunciations are very comprehensive (>200.000) and the project will grow in the future. It is a good idea to add support in GoldenDict to that wonderful project ! |
Can I implement this in The practice of putting everything in the root folder is insane. I don't know why the original author considered /src superfluous goldendict/goldendict@ab88fa4 The project was probably much simpler at that time. I think we will reorganize source files in future for better maintainability. I prefer to put new code in places in a modular way. Also if we actually do this, some header change is inevitable. We can run https://include-what-you-use.org/ over the codebase for faster build time. |
yes, that's nice
I think it's because the original code is migrated from subversion which use src as the default folder. |
Forvo is privatizing voluntary works and there is no free API anymore.
Lingua libre @lingua-libre is a better one:
https://lingualibre.org
The project is under the name of Wikimedia France.
There are 28k English pronunciations already, I think the project is mature enough https://commons.wikimedia.org/wiki/Category:Lingua_Libre_pronunciation-eng
Their data is stored at https://commons.wikimedia.org! They will probably exist forever.
To get pronunciations, we just do a query against the Wikipedia commons database.
Sample query -> "nice" in English:
Just a regex, Files uploaded through Lingua libre have a fixed format of
LL-<language code>-<author>-<word>.wav
Then just grab the url from returned json
The API is
srsearch
in doc, and I have zero ideas why it must be used with the prefixg
:gsrsearch
https://www.mediawiki.org/wiki/API:SearchGet supported language ids -> do this query on https://commons-query.wikimedia.org/
https://lingualibre.org/wiki/Help:SPARQL#Is_Language_.28d:Q34770.29_.E2.86.92_List_existing_languages_with:_LL_Qid.2C_ISO_639-3.2C_Name
Without a personal token, the rate-limited is 500/h which should be enough for most people.
https://api.wikimedia.org/wiki/Documentation/Getting_started/Rate_limits
The interface should be similar to Forvo's in Goldendict's Dict settings, and it does need users to add
language code
or the API will timeout.The text was updated successfully, but these errors were encountered: