-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UTF8 encoding problems in minimal Ubuntu for CI #11
Comments
Thanks a lot for the report. Do you have a complete backtrace? (you can get one by passing I think the solution is here: https://stackoverflow.com/questions/52065842/python-docker-ascii-codec-cant-encode-character (ignore the incorrect duplicate banner) I wonder if this is the same problem as the one that forced @jfehrle to catch encoding exceptions in https://github.com/coq/coq/pull/13564/files#diff-99858e5d76716d34bcaf9ad38b8d67f05a7a8849e7969faa8b2318805d94f223R219 .
I think that's the right solution, precisely because of your point on non-ascii characters in Coq files. Fortunately it looks easy ( |
Complete command and backtrace from inside the container: user@eaac613822d7:~/casper-cbc-proofs$ ~/alectryon/alectryon.py --frontend coqdoc --webpage-style windowed --traceback -Q . CasperCBC --output-directory tmp Lib/Classes.v
Traceback (most recent call last):
File "/home/user/alectryon/alectryon.py", line 26, in <module>
main()
File "/home/user/alectryon/alectryon/cli.py", line 631, in main
process_pipelines(args)
File "/home/user/alectryon/alectryon/cli.py", line 623, in process_pipelines
raise e
File "/home/user/alectryon/alectryon/cli.py", line 620, in process_pipelines
state = call_pipeline_step(step, state, ctx)
File "/home/user/alectryon/alectryon/cli.py", line 589, in call_pipeline_step
return step(state, **{p: ctx[p] for p in params})
File "/home/user/alectryon/alectryon/cli.py", line 326, in <lambda>
write_output(ext, contents, fname, output, output_directory)
File "/home/user/alectryon/alectryon/cli.py", line 322, in write_output
f.write(contents)
UnicodeEncodeError: 'ascii' codec can't encode character '\u2191' in position 6441: ordinal not in range(128) |
@cpitclaudel it actually seems as though the following diff for @@ -318,7 +318,7 @@ def write_output(ext, contents, fname, output, output_directory):
else:
if not output:
output = os.path.join(output_directory, strip_extension(fname) + ext)
- with open(output, mode="w") as f:
+ with open(output, mode="w", encoding="utf-8") as f:
f.write(contents)
def write_file(ext): Since the whole project is supposed to be UTF8 anyway, would a PR with this change be welcome? To me, this would be a better fix than remembering to change |
I was looking at *export PYTHONIOENCODING=utf8*
which is described here:
https://stackoverflow.com/questions/2276200/changing-default-encoding-of-python.
That could be added to the makefile and Dune. (I just did the bandaid fix
of catching the encoding exception because my change is temporary.)
I recall thinking that maybe it should be "utf-8" but didn't figure out if
that's correct.
Jim
…On Mon, Dec 14, 2020 at 9:09 AM Karl Palmskog ***@***.***> wrote:
@cpitclaudel <https://github.com/cpitclaudel> it actually seems as though
the following diff for cli.py solves the issue completely, even with
LANG=C:
@@ -318,7 +318,7 @@ def write_output(ext, contents, fname, output, output_directory):
else:
if not output:
output = os.path.join(output_directory, strip_extension(fname) + ext)- with open(output, mode="w") as f:+ with open(output, mode="w", encoding="utf-8") as f:
f.write(contents)
def write_file(ext):
Since the whole project is supposed to be UTF8 anyway, would a PR with
this change be welcome? To me, this would be a better fix than remembering
to change LANG everywhere.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#11 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAJR7XNFP2NKPDIYGPQXZJ3SUZBF3ANCNFSM4U2U2DNQ>
.
|
would a PR with this change be welcome?
I think that would be a good idea.
…On Mon, Dec 14, 2020 at 11:10 AM Jim Fehrle ***@***.***> wrote:
I was looking at *export PYTHONIOENCODING=utf8*
which is described here:
https://stackoverflow.com/questions/2276200/changing-default-encoding-of-python.
That could be added to the makefile and Dune. (I just did the bandaid fix
of catching the encoding exception because my change is temporary.)
I recall thinking that maybe it should be "utf-8" but didn't figure out if
that's correct.
Jim
On Mon, Dec 14, 2020 at 9:09 AM Karl Palmskog ***@***.***>
wrote:
> @cpitclaudel <https://github.com/cpitclaudel> it actually seems as
> though the following diff for cli.py solves the issue completely, even
> with LANG=C:
>
> @@ -318,7 +318,7 @@ def write_output(ext, contents, fname, output, output_directory):
> else:
> if not output:
> output = os.path.join(output_directory, strip_extension(fname) + ext)- with open(output, mode="w") as f:+ with open(output, mode="w", encoding="utf-8") as f:
> f.write(contents)
>
> def write_file(ext):
>
> Since the whole project is supposed to be UTF8 anyway, would a PR with
> this change be welcome? To me, this would be a better fix than remembering
> to change LANG everywhere.
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <#11 (comment)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AAJR7XNFP2NKPDIYGPQXZJ3SUZBF3ANCNFSM4U2U2DNQ>
> .
>
|
I set up a custom Docker container with Ubuntu (Dockerfile) to be able to run Alectryon with coqdoc on every
master
branch push for a Coq project. However, I quickly ran into UTF8 encoding issues like this:Note that
\u2191
is the "uparrow" Unicode symbol, so the problem came from the use ofHEADER
inalectryon/html.py
.Even after reading up on Python3 encoding issues, I couldn't figure out exactly where there might be a
.encode("utf-8")
missing, so I opted to simply remove all UTF8 from all output by Alectryon and coqdoc. However, since the--utf8
option to coqdoc is hardcoded, I had to use a fork of Alectryon (commit). Also, I believe this means the build will break anytime anyone uses an UTF8 character in a Coq file.Is there a better way to solve this issue? I theorize that one more complete workaround would be to set up a locale (e.g.,
en_US.UTF8
) in the Docker container, but this seems like a cumbersome thing to do in every Docker image where one wants to run Alectryon.The text was updated successfully, but these errors were encountered: