Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnicodeDecodeError when running p2j on a Python script with Chinese comments #16

Open
kuihao opened this issue Mar 18, 2024 · 0 comments

Comments

@kuihao
Copy link

kuihao commented Mar 18, 2024

Description

When running the p2j tool on a Python script (mycode.py) that contains Traditional Chinese comments, I encountered a UnicodeDecodeError. The error message indicates that the ‘cp950’ codec can’t decode a byte at a specific position due to an illegal multibyte sequence.

Steps to Reproduce

Execute the following command: p2j .\mycode.py
Observe the traceback and the specific error message.

Expected Behavior

The p2j tool should handle the encoding correctly, especially when dealing with non-ASCII characters in comments. It should allow specifying the encoding explicitly to avoid such errors.

Env.:

  • OS: window 11
  • virtual env: miniconda3 (conda 23.5.2)
  • python: 3.11.4
  • p2j: 1.3.2

Error message

p2j .\mycode.py
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "C:\ProgramData\miniconda3\Scripts\p2j.exe\__main__.py", line 7, in <module>
  File "C:\ProgramData\miniconda3\Lib\site-packages\p2j\p2j.py", line 244, in main
    p2j(source_filename=args.source_filename,
  File "C:\ProgramData\miniconda3\Lib\site-packages\p2j\p2j.py", line 36, in p2j
    data = [l.rstrip('\n') for l in infile]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\miniconda3\Lib\site-packages\p2j\p2j.py", line 36, in <listcomp>
    data = [l.rstrip('\n') for l in infile]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeDecodeError: 'cp950' codec can't decode byte 0xe8 in position 21: illegal multibyte sequence
kuihao pushed a commit to kuihao/python2jupyter that referenced this issue Mar 18, 2024
Fixed the `UnicodeDecodeError` issue in `p2j` when processing Python
scripts with Chinese comments. Added an encoding parameter
(default=“utf-8”) to handle character encoding. Resolves Issue remykarem#16.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant