Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Unicode encode error with Gitbash #319

Closed
VinceCabs opened this issue Sep 6, 2021 · 7 comments
Closed

[BUG] Unicode encode error with Gitbash #319

VinceCabs opened this issue Sep 6, 2021 · 7 comments

Comments

@VinceCabs
Copy link

VinceCabs commented Sep 6, 2021

Hi,

I came accross a strange behaviour when using colorama with in Gitbash.

Here is a test file :

import colorama

print("\N{Heavy Check Mark} check mark")

Standard run is OK :

$ python test_colorama.py
✔ check mark

But stderr is not empty (we get an error) when printing into a file :

$ python test_colorama.py > output.txt
Traceback (most recent call last):
  File "test_colorama.py", line 3, in <module>
    print("\N{Heavy Check Mark} check mark")
  File "C:\[...]\.venv\lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u2714' in position 0: character maps to <undefined> 
@tartley
Copy link
Owner

tartley commented Oct 7, 2021

It may be that this fix is required to make git bash work: #226

@SSE4
Copy link

SSE4 commented Oct 8, 2021

I've just checked #226 doesn't fix that particular issue for me. I can take a look and let you know on my findings.

@SSE4
Copy link

SSE4 commented Oct 8, 2021

well, if I remove import colorama it fails with the same error for me:

Traceback (most recent call last):
  File "C:\conan\colorama\test_colorama.py", line 3, in <module>
    print("\N{Heavy Check Mark} check mark")
  File "C:\Users\sse4\AppData\Local\Programs\Python\Python39\lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u2714' in position 0: character maps to <undefined>

seems like it's an issue of the python itself, e.g. it doesn't properly detect an encoding of the MSYS2 terminal?
the following command works find:

PYTHONIOENCODING=utf8 python test_colorama.py
✔ check mark

@SSE4
Copy link

SSE4 commented Oct 8, 2021

I think the right fix to detect a proper IO encoding should be done somewhere around here:
https://github.com/python/cpython/blob/db693df3e112c5a61f2cbef63eedce3a36520ded/Python/fileutils.c#L898

may require some collaboration with MSYS and Python developers to make it happen.
right now, I don't know the proper way to programmatically detect the encoding on MSYS terminals.

@SSE4
Copy link

SSE4 commented Oct 8, 2021

some other random notes
in Windows terminal (cmd.exe) it doesn't raise, but shows a question mark instead of check mark:
image

with an output redirection, it doesn't work:

python test_colorama.py > NUL
Traceback (most recent call last):
  File "C:\conan\colorama\test_colorama.py", line 6, in <module>
    print("\N{Heavy Check Mark} check mark")
  File "C:\Users\sse4\AppData\Local\Programs\Python\Python39\lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u2714' in position 0: character maps to <undefined>
python test_colorama.py > out
Traceback (most recent call last):
  File "C:\conan\colorama\test_colorama.py", line 6, in <module>
    print("\N{Heavy Check Mark} check mark")
  File "C:\Users\sse4\AppData\Local\Programs\Python\Python39\lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u2714' in position 0: character maps to <undefined>

setting an environment variable PYTHONLEGACYWINDOWSSTDIO=1 also make it fail with the same error in cmd.exe even without output redirection.
more explanation at sys:

On Windows, UTF-8 is used for the console device. 
Non-character devices such as disk files and pipes use the 
system locale encoding (i.e. the ANSI codepage). 
Non-console character devices such as NUL (i.e. where isatty() returns True) 
use the value of the console input and output codepages at startup, 
respectively for stdin and stdout/stderr. 
This defaults to the system locale encoding 
if the process is not initially attached to a console.

so I think the issue is that mintty is not detected as a console device.
therefore, the proper fix might be implemented around this code:
https://github.com/python/cpython/blob/bb3e0c240bc60fe08d332ff5955d54197f79751c/Modules/_io/winconsoleio.c#L64

@tartley
Copy link
Owner

tartley commented Oct 25, 2022

I think the conclusion here is that this is not a colorama bug, so I'm closing. Please shout if you think I'm wrong. Thank you!

@tartley tartley closed this as completed Oct 25, 2022
@prusswan
Copy link

Related: python/cpython#86873

This is happening for conda-tree which prints outs a tree using '\u251c' or (“├” in Unicode)

Even if this is correct behavior with sysout redirection on Windows, we still need a 'correct' place to set the encoding to utf8 (seems to be using PYTHONIOENCODING=utf-8 for now)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants