-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Script referenced by shellpipe
containing UTF-8 multibyte characters is not applied correctly
#25
Comments
Please let me know if I can provide more information. |
Hi Knut, I love your bug reports for their detail; thanks. Are you running Python in UTF-8 mode as per here? |
Thanks for the pointer. I wasn't aware of the I will do some more testing, but otherwise I think this may already be a good enough solution. It would probably make sense to point this out in the docs (for all us poor Windows users), especially since this project is perceived as a drop-in replacement for But maybe you come to a different conclusion and decide to also change how your application communicates with sub-processes. It would certainly make sense with some tests, to make sure that both the input and output of UTF-8 characters works properly (also on Windows...). I will report back my test results. |
Yep, already written up in the filters doc for the next release. Nice catch about adding it to the migration documentation too.
Philosophically, I prefer to adhere to Python standards as opposed to make the app behave in "unexpected" ways. So I am leaning towards using the standard Python UTF-8 mode and standard subprocess module, and just documenting it. The tests do have UTF-8 characters in them on purpose, but they do run in Python UTF-8 mode on my Windows machine (and succeed).
Thanks, appreciate it! |
Makes sense. I was just thinking that it may make sense to not set I still have quite a few discrepancies compared to I will close this issue for now. |
Regarding this comment: I now notice that it is just that the default for the method is different ( |
Describe the bug
Since I have a somewhat complex setup I will have to try to distill it to a simple reproducer first, but it looks like there is a problem with encoding of input and or output when using the
shellpipe
filter. It appears like the platform default encoding is applied (CP1252 on Windows) and I am not aware of any way toTo Reproduce
I will try to work something out. But a simple script like this already causes failures on Windows:
Example error:
Expected behavior
I don't know the internals of
webchanges
but I think it may make sense to pass input and output streams in binary mode withshellpipe
and then where required encode and decode strings using a default of UTF-8, which can possibly be configured somehow.Screen scrape/screenshots
If applicable, add screen scrape or screenshots to help explain your problem.
Version info
Additional context
See also relevant code from
urlwatch
: https://github.com/thp/urlwatch/blob/b1d4bc8526dc8ec680c50d3e3476797d18915635/lib/urlwatch/filters.py#L925-L926webchanges
appears to usetext=True
while not specifying any encoding. I have in my environment set the variablePYTHONIOENCODING=UTF-8
, but I am relatively new to Python so I don't exactly know when and where this has any effect.The text was updated successfully, but these errors were encountered: