Skip to content

Anki plugin that helps clean up Japanese text by removing unnecessary spaces and furigana

License

Notifications You must be signed in to change notification settings

matthayes/anki_japanese_text_cleaner

Repository files navigation

Anki Japanese Text Cleaner

Build Status Release

This is a plugin for Anki that can help clean up Japanese text in the following ways:

Clean unnecessary spaces, while preserving those spaces that are necessary for indicating the start of a Japanese reading (as in the Japanese Support plugin).

<b> 一[いち]</b>から 始[はじ]めましょう。
=>
<b>一[いち]</b>から 始[はじ]めましょう。

彼女[かのじょ]はイタリア 語[ご]が<b> できます</b>。
=>
彼女[かのじょ]はイタリア 語[ご]が<b>できます</b>。

Clean unnecessary furigana from the beginning, end, or within individual readings within text.

別に[べつに]
=>
別[べつ]に

世の中[よのなか]
=>
世[よ]の 中[なか]

Both are designed to:

  • Properly handle text with multiples lines
  • Properly handle text with HTML markup by allowing HTML tags to pass through unchanged

In addition, there are some features to guard against accidental changes or bugs in the plugin:

  • A Check action logs all changes that would be made without taking any action.
  • A Diff action produces a colorful HTML diff highlighting in green what will been added and in red what will be removed for each note.
  • A 'Fix' action actually performs the changes.
  • Each batch of changes is recorded in the undo history within Anki.
  • A full change log is kept in a SQLite database within the plugin's local directory. Recent changes can be viewed in the UI and the full history of changes can be exported to a CSV file. This enables you to recover any previous values altered by the plugin.

Despite these safety features, it's a good idea to back up or export your collection before using this plugin just to be safe.

You can access the dialogs by clicking Browse to open the card browser and then clicking Edit -> Japanese Text Cleaner. The fixer dialgos require you to select some cards first. These are the cards that will be checked.

Screenshots

Dialog to check for unnecessary spacing:

Dialog to check for unnecessary furigana:

HTML diff of proposed spacing changes:

HTML diff of proposed furigana changes:

Viewing the log of changes:

Testing

I have tested against the following shared decks which I found on Anki's Japanese shared decks page. Below I include some stats as of July 10, 2019 when I lasted tested the plugin against the decks.

Deck Notes Field Spacing Fixes Furigana Fixes
Japanese Core 2000 2k - Sorted w/ Audio 2007 Reading 266 1
Japanese Visual Novel, Anime, Manga, LN Vocab - V2K 1988 Reading 4 55

To get a better idea about how the plugin works, I've included some examples from each deck.

Examples: Japanese Core 2000 2k

Unnecessary space removed from beginning of line:

<b> 一[いち]</b>から 始[はじ]めましょう。
=>
<b>一[いち]</b>から 始[はじ]めましょう。
<b> 月曜日[げつようび]</b>に 会[あ]いましょう。
=>
<b>月曜日[げつようび]</b>に 会[あ]いましょう。

Unnecessary space removed from within line:

彼女[かのじょ]はイタリア 語[ご]が<b> できます</b>。
=>
彼女[かのじょ]はイタリア 語[ご]が<b>できます</b>。

Redundant furigana removed from end of word:

彼女[かのじょ]はよく<b>喋る[しゃべる]</b>ね。
=>
彼女[かのじょ]はよく<b>喋[しゃべ]る</b>ね。

Examples: Japanese Visual Novel, Anime, Manga, LN Vocab

Redundant furigana removed from the end:

別に[べつに]
=>
別[べつ]に
疲れる[つかれる]
=>
疲[つか]れる
相変わらず[あいかわらず]
=>
相変[あいか]わらず

Redunant furigana removed from the middle:

当たり前[あたりまえ]
=>
当[あ]たり 前[まえ]
振り返る[ふりかえる]
=>
振[ふ]り 返[かえ]る
世の中[よのなか]
=>
世[よ]の 中[なか]

Unnecessary space removed:

杯[はい],   杯[さかずき]
=>
杯[はい], 杯[さかずき]

Version History

  • 0.1: Initial Release

License

Copyright 2019 Matthew Hayes

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

About

Anki plugin that helps clean up Japanese text by removing unnecessary spaces and furigana

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published