Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Make "disable i18n" logic optional #899

Closed
wants to merge 3 commits into from
Closed

RFC: Make "disable i18n" logic optional #899

wants to merge 3 commits into from

Conversation

arvidma
Copy link
Contributor

@arvidma arvidma commented Sep 13, 2023

I've had issues with Mercurial refusing to run (OSError hg not found) when called by setuptools-scm. After some digging, it turns out that this is due to an ASCII-decode exception when Python loads site modules.

The run() function in _run_cmd.py overrides the LC_ALL environment variable, removing this override made hg/python happy enough to run again.

This pull request contains a first stab at making the LC_ALL override configurable. The changes are enough to solve the specific problem I am experiencing, but it is not complete. I would like to have your feedback on the general approach, before investing the time to roll out disable_i18n config to all of setuptools-scm.

Copy link
Contributor

@RonnyPfannschmidt RonnyPfannschmidt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

based in the initial input i'm under the impression, that rather than using "C", a encoding name that allows for utf8 or a system encoding should be used

what are the values of the LC_*/LANG variables on your system, and whats the concrete encoding error you observe?

@@ -101,6 +101,10 @@ Callables or other Python objects have to be passed in `setup.py` (via the `use_
Setuptools will still normalize it to create the final distribution,
so as to stay compliant with the python packaging standards.

`disable_i18n`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

having it a config options is fundamentally wrong as everyone affected would have to enable it individually

@arvidma
Copy link
Contributor Author

arvidma commented Sep 13, 2023

based in the initial input i'm under the impression, that rather than using "C", a encoding name that allows for utf8 or a system encoding should be used

what are the values of the LC_*/LANG variables on your system, and whats the concrete encoding error you observe?

The original error was:

   [...]
    DEBUG setuptools_scm.run_cmd err:
          Fatal Python error: init_import_site: Failed to import the site module
          Python runtime state: initialized
          Traceback (most recent call last):
            File "<frozen importlib._bootstrap>", line 1176, in _find_and_load
            File "<frozen importlib._bootstrap>", line 1147, in _find_and_load_unlocked
            File "<frozen importlib._bootstrap>", line 690, in _load_unlocked
            File "<frozen importlib._bootstrap>", line 980, in exec_module
            File "<frozen site>", line 623, in <module>
            File "<frozen site>", line 606, in main
            File "<frozen site>", line 538, in venv
            File "<frozen site>", line 391, in addsitepackages
            File "<frozen site>", line 226, in addsitedir
            File "<frozen site>", line 179, in addpackage
            File "/usr/lib64/python3.11/encodings/ascii.py", line 26, in decode
              return codecs.ascii_decode(input, self.errors)[0]
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
          UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 22: ordinal not in range(128)
      DEBUG setuptools_scm.run_cmd ret: 1
   [...]

I don't have any LC_ variables set in my own environment, but the output of locale is:

LANG=sv_SE.UTF-8
LC_CTYPE="sv_SE.UTF-8"
LC_NUMERIC="sv_SE.UTF-8"
LC_TIME="sv_SE.UTF-8"
LC_COLLATE="sv_SE.UTF-8"
LC_MONETARY="sv_SE.UTF-8"
LC_MESSAGES="sv_SE.UTF-8"
LC_PAPER="sv_SE.UTF-8"
LC_NAME="sv_SE.UTF-8"
LC_ADDRESS="sv_SE.UTF-8"
LC_TELEPHONE="sv_SE.UTF-8"
LC_MEASUREMENT="sv_SE.UTF-8"
LC_IDENTIFICATION="sv_SE.UTF-8"
LC_ALL=

For the project I am building, there are non-ascii characters in both package and module names, so pretty much every path will have one or more of them.

@arvidma
Copy link
Contributor Author

arvidma commented Sep 13, 2023

@RonnyPfannschmidt, following your suggestion I tried just changing LC_ALL="C" to LC_ALL="C.UTF-8".

This minimal change also solved my problem, so it would be a perfectly good solution for me.

Do you think this change needs to be possible to opt out from (via e.g. environment variable), or could it be accepted as a one-liner?

@RonnyPfannschmidt
Copy link
Contributor

the one liner is much better than the option

it would be nice to have a unit-test for this tho

@arvidma
Copy link
Contributor Author

arvidma commented Sep 13, 2023

the one liner is much better than the option

it would be nice to have a unit-test for this tho

I think I can arrange that. :-) I'll drop this PR and make a new one in a bit.

@arvidma arvidma closed this Sep 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants