-
-
Notifications
You must be signed in to change notification settings - Fork 30.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-126742: allow to use non-UTF8 exception messages #126746
base: main
Are you sure you want to change the base?
gh-126742: allow to use non-UTF8 exception messages #126746
Conversation
picnixz
commented
Nov 12, 2024
•
edited by bedevere-app
bot
Loading
edited by bedevere-app
bot
- Issue: Allow to set non-UTF8 exception messages #126742
Technically, since this is a bugfix, let's wait until #126555 has landed before we merge this, and then address all the uses of |
I actually thought of doing it two PRs. Technically, this specific PR can be merged without waiting for the other and then you can use this interface in the other (rather the other way around). |
Eh, I would rather not. Let's not overwhelm someone with thread states and the private API on their first PR :) |
I've removed backport labels, per what Petr said in the issue. I think he's right--let's just focus on main with this. |
Hmm, just had an idea: if we add a “ |
I like that idea! But we're exposing this as a public API--how often might users need this? (I've never needed to mess with locale in any of my C API usage, but to be fair I'm typically not calling APIs that return strings.) |
I'm not very fond of adding a case to additionally check when there's not a lot of use cases (unless proven otherwise). It shouldn't affect performances much (but this needs verification) but this would complicate an already complicate function I think (and people need to remember this new format as well). |
No no, I don't think he meant a variation of |
And I understood. I don't think a new code should be added unless there is a real need. |
Ah! I couldn't tell, sorry. A code search for |
It could also show that we're mainly handling errors from the OS at one point -- the OSError constructor. That would be a good thing :) Anyway, for this PR: @picnixz, do you want to add calls to the new helper(s)? |
FWIW, Bénédikt is on vacation right now. I don't think he'll be able to implement anything until next week. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see the purpose of these internal functions since they are now used. This PR only adds dead code. I would prefer to see how these functions are used.
If only dlerror() uses it, why not using code using dlerror()?
The internal fix makes a lot sense, thank you! Decoding with Also, since Like Serhyi, I don't think a combanation of two public-API functions needs to be added to the public API, let alone Stable API. But, that's something for the C API WG to decide, if you bring it there. |
I'll make it private for now and we can see in the future if we want or not to make it public.
Then, what should we do with the code that currently ignores them? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a practical way to get a non-ASCII error messages not encoded to UTF-8 for the following functions?
- dlerror()
- gdbm_strerror()
How can I test (manually) modified code paths?
I don't know :( Maybe changing the locale env. var. would be sufficient (assuming you have the translated messages) butI'm really bad at setting up locales so I don't really know =/ |
I confirm that dlerror() is encoded the current locale encoding: import ctypes
import locale
RTLD_NOW = 2
locale.setlocale(locale.LC_ALL, "")
libdl = ctypes.cdll.LoadLibrary('libdl.so.2')
libdl.dlerror.restype = ctypes.c_char_p
libdl.dlopen(b'donotexist.so', RTLD_NOW)
errmsg = libdl.dlerror()
print(errmsg) Output:
The character "é" (U+00E9) is encoded to ISO-8859-1 (\xe9) or UTF-8 (\xc3\xa9). |
We will add internal functions later in one go instead one-by-one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change LGTM, but I didn't test it (manually).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Translated error messages most likely depend on the LC_MESSAGE environment variable. It is a separate difficult issue if its encoding differs from LC_CTYPE's encoding.
But if it is non-ASCII because it includes file paths, we should use the FS encoding.
What should we do for that? should we fix this in a follow-up PR? |
Co-authored-by: Victor Stinner <vstinner@python.org>
IMO it's reasonable to expect LC_MESSAGE and LC_CTYPE locales to use the same encoding in this API. It's already an enhancement compared to the current code which always expect UTF-8. |
Raise the error from |
Oh ok! I'll change it in a few hours (coming back home right now) |