Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workaround for Japanese Windows environment #108

Merged
merged 1 commit into from
Feb 2, 2023

Conversation

mizar
Copy link
Collaborator

@mizar mizar commented Sep 1, 2022

  • Avoiding problems with incorrect encoding
    UnicodeDecodeError: 'cp932' codec can't decode byte 0x85 in position 2897: illegal multibyte sequence
  • Added option to output directly to file
    PowerShell converts encoding and newline characters when passing stdout

@mizar
Copy link
Collaborator Author

mizar commented Sep 1, 2022

example:

mkdir -p .dist
python3 expand.py -o .dist/convolution.rs convolution
python3 expand.py -o .dist/dsu.rs dsu
python3 expand.py -o .dist/fenwicktree.rs fenwicktree
python3 expand.py -o .dist/lazysegtree.rs lazysegtree
python3 expand.py -o .dist/math.rs math
python3 expand.py -o .dist/maxflow.rs maxflow
python3 expand.py -o .dist/mincostflow.rs mincostflow
python3 expand.py -o .dist/modint.rs modint
python3 expand.py -o .dist/scc.rs scc
python3 expand.py -o .dist/segtree.rs segtree
python3 expand.py -o .dist/string.rs string
python3 expand.py -o .dist/twosat.rs twosat
python3 expand.py -o .dist/all.rs --all

@mizar
Copy link
Collaborator Author

mizar commented Sep 3, 2022

  • Specify utf-8 encoding and no conversion for newline characters
    エンコーディングを utf-8, 改行文字を無変換に指定

In Python on Japanese Windows, the default preferred encoding value for text file input/output is often cp932 (shift_jis, windows-31j) instead of utf-8, which seems to be the cause of the trouble.

日本語Windows環境でのPythonでは、テキストファイル入出力時にデフォルトとなる preferred encoding の値が utf-8 ではなく cp932 (shift_jis, windows-31j) となっている事が多いのがトラブルの原因のようです。

With no encoding, source code written in utf-8 (where non-ASCII characters (such as Uɴɪᴏɴ, Fɪɴᴅ, and other) in comments) may fail to be read by cp932 (shift_jis, windows_31j). (There is also a way to specify the -X utf8 option to python3)

エンコーディング無指定だと、utf-8で書かれたソースコード ( コメント内に UɴɪᴏɴFɪɴᴅ など、非ASCII文字が使われている部分がある ) を cp932 (shift_jis, windows_31j) で読み込もうとして失敗する場合があるため、ファイル入出力に utf-8 を強制するようにしました。 (-X utf8 オプションをpython3に指定する方法もあるようです)

  • Added option to output directly to file instead of standard output.
    出力を標準出力ではなく、直接ファイル出力できるようオプション追加

When running on PowerShell, the encoding and newline characters of the stdout strings are sometimes converted, which makes it troublesome to take measures when redirecting output to a file, so I added an option to allow direct file output (if you are running on the (I guess it's no problem if you run it on command prompt, not on PowerShell (version 6 or later)...) Even with PowerShell (version 6 or later), you can use pipe instead of redirect | Out-File -Encoding utf8 filepath.rs (reconverted to UTF-8 without BOM) or something like that. (but PowerShell5 seems to have only the option -Encoding UTF-8 (UTF-8 with BOM), and what to do with newline characters, etc...)

PowerShell上で実行してしまうと、標準出力された文字列のエンコーディングや改行文字が変換されてしまう事があり、ファイルにリダイレクト出力する際の対策が面倒くさくなるため、直接ファイル出力できるオプションを追加してみました (PowerShell上ではなく、コマンドプロンプト上で実行するなら問題なさそうですが…。PowerShell (バージョン6以降) でも、リダイレクトではなくパイプで | Out-File -Encoding utf8 filepath.rs (BOMなしUTF-8へ再変換)とかすれば良いのですが、PowerShell5 では -Encoding UTF-8 (BOM有りUTF-8) のオプションしかなさそうです、更に改行文字をどうするか等は…)

@qryxip
Copy link
Member

qryxip commented Sep 11, 2022

#109

@mizar mizar mentioned this pull request Jan 21, 2023
- Avoiding problems with incorrect encoding
`UnicodeDecodeError: 'cp932' codec can't decode byte 0x85 in position 2897: illegal multibyte sequence`
- Added option to output directly to file
PowerShell converts encoding and newline characters when passing stdout
@mizar
Copy link
Collaborator Author

mizar commented Jan 22, 2023

Rebase commit to current rust-lang-ja:master.

Copy link
Member

@qryxip qryxip left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍
Merging.

@qryxip qryxip merged commit 6a2e7f6 into rust-lang-ja:master Feb 2, 2023
@mizar mizar deleted the expand_fix branch February 3, 2023 08:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants