Skip to content

Working with Japanese encoded source files (macOS)

Matt Sephton edited this page Jan 7, 2020 · 12 revisions

For historic and cultural reasons, the files that comprise px68k are encoded in Japanese as EUC-JP, rather than the more modern Shift_JIS, or the current "standard" of UTF-8.

Editing such files without any special consideration will cause the encoding to be converted/lost and the Japanese characters to will become corrupted/mangled.

this:

/* -------------------------------------------------------------------------- *
 *  PROP.C - 各種設定用プロパティシートと設定値管理                           *
 * -------------------------------------------------------------------------- */

becomes this:

/* -------------------------------------------------------------------------- *
 *  PROP.C - ³Æ¼ïÀßÄêÍÑ¥×¥í¥Ñ¥Æ¥£¥·¡¼¥È¤ÈÀßÄêÃÍ´ÉÍý                           *
 * -------------------------------------------------------------------------- */

So we have a number of options:

  1. Use an editor that is clever enough to take care of the encoding automatically:

    • BBEdit
  2. Set encoding on the file at an xattr level:

    • xattr -w com.apple.TextEncoding "EUC-JP;2336" "/path/to/file.txt"
    • xattr -w com.apple.TextEncoding "Shift_JIS;2561" "/path/to/file.txt"

    And use an editor that supports this approach:

    • TextMate
    • Sublime Text
    • (most native editors?)
  3. Augment an editor so that it has a mechanism to cope:

  4. Mass convert the files if there is good reason to do so:

    • iconv -f EUCJP -t UTF8 "/path/to/input.txt" > "/path/to/output.txt"
    • nkf -w "/path/to/input.txt" > "/path/to/output.txt"

Problematic Editors