-
-
Notifications
You must be signed in to change notification settings - Fork 503
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UnicodeDecodeError in stdout/err thread #556
Comments
Hi @r4lv. If you run chardet.detect on your input, what encoding does it say it is? |
Thank you for the idea! |
Hmm, sh should respect your This is where it gets its default value, which would be your system encoding (but you are overriding it) The encoding value is passed into two
and here
The StreamReaders create a chunk processor, using that passed-in encoding directly, here:
A "file chunk consumer" is created because your output (stdout) has a
In the file chunk consumer, the encoding is picked up off of the StreamReader handler here: So if you are specifying "latin1", I am not seeing how "utf8" makes it to line 2918. I don't see anywhere where the "encoding" parameter is overwritten by anything. 🤔 |
Can you capture your stdout and attach it here? That would make it easier to reproduce the error and narrow down the problem. |
We'll need a reproducible example to be able to debug this. I'm closing the issue, but feel free to reopen if you get back to this. |
I hit this recently too. It seems that when the descriptor passed to
With
|
There must be something more affecting this. I cannot reproduce the above on OS X and Python 3.10.1: >>> import sh, sys
>>> from io import StringIO
>>> sh.env("LC_CTYPE=POSIX", "gpg", "--fingerprint", "E6B0152CE5614F6680EEFDC25D11C3B7A2E59699", _out=StringIO(), _tee=True, _decode_errors="ignore")
pub ed25519 2022-10-19 [SC] [expires: 2024-10-18]
E6B0 152C E561 4F66 80EE FDC2 5D11 C3B7 A2E5 9699
uid [ ultim. ] John Doe <john@example.com>
sub cv25519 2022-10-19 [E] [expires: 2024-10-18]
>>> sh.env("LC_CTYPE=POSIX", "gpg", "--fingerprint", "E6B0152CE5614F6680EEFDC25D11C3B7A2E59699", _out=sys.stdout, _tee=True, _decode_errors="ignore")
pub ed25519 2022-10-19 [SC] [expires: 2024-10-18]
E6B0 152C E561 4F66 80EE FDC2 5D11 C3B7 A2E5 9699
uid [ ultim. ] John Doe <john@example.com>
sub cv25519 2022-10-19 [E] [expires: 2024-10-18]
pub ed25519 2022-10-19 [SC] [expires: 2024-10-18]
E6B0 152C E561 4F66 80EE FDC2 5D11 C3B7 A2E5 9699
uid [ ultim. ] John Doe <john@example.com>
sub cv25519 2022-10-19 [E] [expires: 2024-10-18] |
The output of The example I provided actually contains a special character:
|
This fixes the example I provided at amoffat#556 (comment) by making fd/file_chunk_consumer respect decode_errors just like the other consumers.
This fixes the example I provided at #556 (comment) by making fd/file_chunk_consumer respect decode_errors just like the other consumers.
Dear amoffat,
First, thank you for such an amazing project!
Lately, I've been using sh to generate LaTeX documents with
latexmk
, and sh gets stuck with anUnicodeDecodeError
— I think this is related to_tee
, and some strange stdout output. I'm using sh like this:and it never finishes the
wait()
. This is the error I obtain:(I'm getting the same error on python 3.7)
I "solved" that problem by adding an
errors="ignore"
to thechunk.decode
in line 2918, but I'm not sure if there are any side-effects:sh/sh.py
Lines 2916 to 2920 in 12420ed
0xf6
seems to be the germanö / o with diaeresis
, I don't know whyutf-8
couldn't decode that — would you have an idea?The text was updated successfully, but these errors were encountered: