kkci2pron(Kana Kanji Conversion Input to Pronunciation)

Contribution

The kkci2pron converts a Japanese yomi, precisely Kana-Kanji conversion input, to a pronunciation.

With this program, you can generate a speaking-style corpus from a writing-style corpus annotated with word boundaries and Japanese yomis, then can construct a speaking-stype language model. You can improve an accurary of a speech recogniton system by combining this language model and the domain-independent large corpus, i.e. CSJ. This is proven [1].

This program is developed by Yohei Yamaguchi when he was a graduate student. If you have an any question, please contact him.

Installation

$ git clone git://github.com/gologo13/kkci2pron

You must install Kyfd (the Kyoto Fst Decoder) before running kkci2pron.

Configuration

Edit config.xml to setup kyfd before running the kkci2pron.

Usage

$ cat sample.txt
私/ワタシ は/ハ 太郎/タロウ です/デス
気温/キオン 変動/ヘンドウ
$ perl bin/kkci2pron.pl ＜ sample.txt
私/ワタシ は/ワ 太郎/タロー です/デス
気温/キオン 変動/ヘンドー

Input Format

An input text must follow the following format.

text := sentence + '\n'(newline character) + sentence + … + sentence

sentence := unit + ' '(space) + unit + … + unit

unit := word + '/'(slash) + yomi

word := (Japanese Full-width Character)+

yomi := (Japanese Full-width Katakana Character)+

Next, an input text must be encoded in UTF8.

License

MIT License. Please see the LICENSE file for details.

Reference

[1]山口洋平、森信介、河原達也
仮名漢字変換ログを用いた講義音声認識のための言語モデル適応
言語処理学会第18回年次大会(NLP2012)、広島、March 2012

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
bin		bin
data		data
Changes		Changes
LICENSE		LICENSE
README.md		README.md
config.xml		config.xml
sample.txt		sample.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

kkci2pron(Kana Kanji Conversion Input to Pronunciation)

Contribution

Installation

Configuration

Usage

Input Format

License

Reference

About

Releases

Packages

Languages

License

gologo13/kkci2pron

Folders and files

Latest commit

History

Repository files navigation

kkci2pron(Kana Kanji Conversion Input to Pronunciation)

Contribution

Installation

Configuration

Usage

Input Format

License

Reference

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages