The basic function of pronlex
is to store and retrieve lexical entries. An entry consist of a word form, along with a phonetic transcription, a status, a database and lexicon name, and possibly additional values.
A code version of an entry is defined in lex.Entry. Documentation is is available here.
An entry can be converted to and from JSON.
The pronlex
package consists of a lexicon database and a lexicon server, plus some additional helper modules. The server is written in Go
The lexicon server has an HTTP Rest API. More information about the HTTP API can be found below.
The lexicon database stores entries in a relational database, Sqlite3. The SQL schema --- the definition of the database structure --- is a string constant found in the file schema.go.
There is an HTTP server for the pronlex database. A documentation of the HTTP API can be accessed once the server is started (default address: http://localhost:8787).
Core API call for (readonly) TTS usage:
- /lexicon/lookup
The most important API URLs can be found in the list below. For more information, and a complete list of API calls, please see the full documentation using local running lexicon server.
- /lexicon/list
- /lexicon/lookup
- /lexicon/entries_exist
- /lexicon/info/{lexicon_name}
- /lexicon/stats/{lexicon_name}
- /lexicon/updateentry
- /lexicon/addentry
- /lexicon/delete_entry/{lexicon_name}/{entry_id}
- /admin/list_dbs
- /admin/create_db/{db_name}
- /admin/define_lex/{lexicon_name}/{locale}/{symbolset_name}
- /admin/deletelexicon/{lexicon_name}
- /admin/superdeletelexicon/{lexicon_name}
The database can be called using a set of functions defined in the database manager, dbapi.DBManager.
Internally, the database interaction is performed using functions defined in dbapi.go.
The database can be queried through the DBManager
using a query struct, dbapi.DBMQuery
The DBMQuery contains the reference to a lexicon and the actual dbapi.Query.
Such a query struct can be converted to and from JSON.
TODO: Overview of the database tables and basic constraints
A query from the dbapi is converted to a SQL query string. This happens in sql_gen.go.
The query string is then used to retrieve entries using functions in dbapi
There are stand-alone commands for managing the lexicon database. These are located in the cmd
- createEmptyDB - create an empty lexicon database (sqlite) file
- createEmptyLexicon - create an empty lexicon in a lexicon database
- exportLex - export a lexicon from a database file to a text file
- importLex - import a lexicon (text) file to a database
- importSql - import an lexicon sql dump into a database file
- lexlookup - command line tool for lexicon search/lookup
- validate_lex_file - command line tool for validating a lexicon (text) file
Create an sql dump from a database:
sqlite3 <dbFile> .dump | gzip -c > <sqlDumpFile>
Import an sql dump to a database:
gunzip -c <sqlDumpFile> | sqlite3 <dbFile>
- symbolset - phonetic symbol sets are defined per lexicon, and can be used for validation
- symbolset mapper - component to convert between different phonetic symbol sets in the same language
- converter - component to convert transcription between different languages
- validation - validation components can be created to validate lexicon entries for various issues
- phonetic symbols
- transcription format
- phonotactic rules
- syllable boundaries
- sanity checks
- etc
- lexicon format definitions
- default Wikispeech lexicon format
- data conversion (for more information, see the wikispeech-lexdata) repository
- CMU2WS - CMU US English
- csCzPhword2WS - Czech dictionary
- nbNoNST2WS - NST Norwegian bokmål
- svSeNST2WS - NST Swedish
- admin - various admin tools apart from those listed above