[Feature] First class Lingua Libre support #263

shenlebantongying · 2022-12-21T05:06:39Z

Forvo is privatizing voluntary works and there is no free API anymore.

Lingua libre @lingua-libre is a better one:

The project is under the name of Wikimedia France.

There are 28k English pronunciations already, I think the project is mature enough https://commons.wikimedia.org/wiki/Category:Lingua_Libre_pronunciation-eng

Their data is stored at https://commons.wikimedia.org! They will probably exist forever.

To get pronunciations, we just do a query against the Wikipedia commons database.

Sample query -> "nice" in English:

Just a regex, Files uploaded through Lingua libre have a fixed format of LL-<language code>-<author>-<word>.wav

curl "https://commons.wikimedia.org/w/api.php?action=query&format=json&prop=imageinfo&generator=search&iiprop=url&iimetadataversion=1&iiextmetadatafilter=Categories&gsrsearch=intitle%3A%2FLL-Q1860%20%5C(eng%5C)-.*-nice%5C.wav%2F&gsrnamespace=6&gsrlimit=10&gsrwhat=text"

Then just grab the url from returned json

{
  "batchcomplete": "",
  "query": {
    "pages": {
      "88511149": {
        "pageid": 88511149,
        "ns": 6,
        "title": "File:LL-Q1860 (eng)-Back ache-nice.wav",
        "index": 2,
        "imagerepository": "local",
        "imageinfo": [
          {
            "url": "https://upload.wikimedia.org/wikipedia/commons/6/6a/LL-Q1860_%28eng%29-Back_ache-nice.wav",
            "descriptionurl": "https://commons.wikimedia.org/wiki/File:LL-Q1860_(eng)-Back_ache-nice.wav",
            "descriptionshorturl": "https://commons.wikimedia.org/w/index.php?curid=88511149"
          }
        ]
      },
      "73937351": {
        "pageid": 73937351,
        "ns": 6,
        "title": "File:LL-Q1860 (eng)-Nattes \u00e0 chat-nice.wav",
        "index": 1,
        "imagerepository": "local",
        "imageinfo": [
          {
            "url": "https://upload.wikimedia.org/wikipedia/commons/b/b0/LL-Q1860_%28eng%29-Nattes_%C3%A0_chat-nice.wav",
            "descriptionurl": "https://commons.wikimedia.org/wiki/File:LL-Q1860_(eng)-Nattes_%C3%A0_chat-nice.wav",
            "descriptionshorturl": "https://commons.wikimedia.org/w/index.php?curid=73937351"
          }
        ]
      }
    }
  }
}

The API is srsearch in doc, and I have zero ideas why it must be used with the prefix g: gsrsearch https://www.mediawiki.org/wiki/API:Search

Get supported language ids -> do this query on https://commons-query.wikimedia.org/

https://lingualibre.org/wiki/Help:SPARQL#Is_Language_.28d:Q34770.29_.E2.86.92_List_existing_languages_with:_LL_Qid.2C_ISO_639-3.2C_Name

Without a personal token, the rate-limited is 500/h which should be enough for most people.

https://api.wikimedia.org/wiki/Documentation/Getting_started/Rate_limits

The interface should be similar to Forvo's in Goldendict's Dict settings, and it does need users to add language code or the API will timeout.

The text was updated successfully, but these errors were encountered:

shenlebantongying · 2022-12-21T05:13:24Z

Their data is usable already somehow but it is not usable out-of-box. User can just download their dataset https://lingualibre.org/datasets/ and put it under the sound folder.

https://lingualibre.org/wiki/LinguaLibre:Chat_room/Archives/2021#.22How_to_use_Lingua_Libre_for_your_language_learning.22

Exponent4806 · 2022-12-21T06:51:15Z

@lingua-libre French pronunciations are very comprehensive (>200.000) and the project will grow in the future.

It is a good idea to add support in GoldenDict to that wonderful project !

shenlebantongying · 2022-12-22T09:17:32Z

Can I implement this in GD/src/dictionary/lingualibre.cc or GD/dictionary/lingualibre.cc rather than directly under the root? @xiaoyifang

The practice of putting everything in the root folder is insane. I don't know why the original author considered /src superfluous goldendict/goldendict@ab88fa4 The project was probably much simpler at that time.

I think we will reorganize source files in future for better maintainability. I prefer to put new code in places in a modular way. Also if we actually do this, some header change is inevitable. We can run https://include-what-you-use.org/ over the codebase for faster build time.

xiaoyifang · 2022-12-22T09:43:21Z

Can I implement this in GD/src/dictionary/lingualibre.cc or GD/dictionary/lingualibre.cc rather than directly under the root? @xiaoyifang

yes, that's nice

. I don't know why the original author considered /src superfluous

I think it's because the original code is migrated from subversion which use src as the default folder.

shenlebantongying mentioned this issue Dec 22, 2022

Add Lingua Libre support #268

Merged

xiaoyifang closed this as completed in #268 Dec 24, 2022

shenlebantongying mentioned this issue Apr 27, 2023

List issues fixed in this repo while not in official repo #587

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] First class Lingua Libre support #263

[Feature] First class Lingua Libre support #263

shenlebantongying commented Dec 21, 2022 •

edited

Loading

shenlebantongying commented Dec 21, 2022 •

edited

Loading

Exponent4806 commented Dec 21, 2022

shenlebantongying commented Dec 22, 2022

xiaoyifang commented Dec 22, 2022

[Feature] First class Lingua Libre support #263

[Feature] First class Lingua Libre support #263

Comments

shenlebantongying commented Dec 21, 2022 • edited Loading

shenlebantongying commented Dec 21, 2022 • edited Loading

Exponent4806 commented Dec 21, 2022

shenlebantongying commented Dec 22, 2022

xiaoyifang commented Dec 22, 2022

shenlebantongying commented Dec 21, 2022 •

edited

Loading

shenlebantongying commented Dec 21, 2022 •

edited

Loading