GitHub - catemlitten/homophone_finder: Simple script to find homophones given a list of Japanese words

Homophone Finder

This is a simple script with takes in a raw text file of Japanese words and finds homophones.

A sample list was included with this repo, a simple frequency list of 1000 words taken from Offbeat Band here. You can use any list you like, but the format will need to match the one provided, ie:

さく
貰う
真実
ゲーム
団

If it is not you will need to do some data cleaning beforehand.

Notes

In order to reduce ambiguity, words comprised only of kana in the list are removed by the script using a simple character check. Based on the listing here the bulk of kanji in everyday use fall between 4e00 and 9faf.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitignore		.gitignore
README.md		README.md
convert_list_to_homophones.py		convert_list_to_homophones.py
offbeat_raw.txt		offbeat_raw.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Homophone Finder

Notes

About

Releases

Packages

Contributors 2

Languages

catemlitten/homophone_finder

Folders and files

Latest commit

History

Repository files navigation

Homophone Finder

Notes

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages