Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow to set non-UTF8 exception messages #126742

Open
picnixz opened this issue Nov 12, 2024 · 12 comments
Open

Allow to set non-UTF8 exception messages #126742

picnixz opened this issue Nov 12, 2024 · 12 comments
Assignees
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) topic-C-API type-feature A feature request or enhancement

Comments

@picnixz
Copy link
Contributor

picnixz commented Nov 12, 2024

Feature or enhancement

Proposal:

This is a follow-up to the discussion in #126555 (comment).

dlerror() may return non-UTF-8 messages, possibly translated. We should be able to set exception messages according to the current locale. To that end, we'll expose some internal helper:

extern void 
_PyErr_SetLocaleStringTstate(PyThreadState *tstate, PyObject *exception, const char *string);

extern void
_PyErr_SetLocaleString(PyObject *exception, const char *string);

cc @ZeroIntensity @encukou

For now, both functions would be only declared as extern and not part of the public C API.

Linked PRs

@picnixz picnixz added type-feature A feature request or enhancement interpreter-core (Objects, Python, Grammar, and Parser dirs) labels Nov 12, 2024
@picnixz picnixz self-assigned this Nov 12, 2024
@picnixz
Copy link
Contributor Author

picnixz commented Nov 12, 2024

Let's actually mark it as a bugfix since we'll need to backport it in order to use it later.

@picnixz picnixz added type-bug An unexpected behavior, bug, or error 3.12 bugs and security fixes 3.13 bugs and security fixes 3.14 new features, bugs and security fixes and removed type-feature A feature request or enhancement labels Nov 12, 2024
@ZeroIntensity
Copy link
Member

ZeroIntensity commented Nov 12, 2024

Hmm, so is PyErr_SetLocaleString supposed to be implicitly private? I'm not the biggest fan of that--if something is supposed to be private, then we need to prefix it with _, not only so users don't touch it, but so people reading the source don't mistake it as public. If we want a variation of it that takes a thread state, we could have something like _PyErr_SetLocaleStringTstate.

@picnixz
Copy link
Contributor Author

picnixz commented Nov 12, 2024

Hmm, so is PyErr_SetLocaleString supposed to be implicitly private

For now, but maybe we can directly expose to the C API? I want to be consistent with the naming of each function.

@picnixz
Copy link
Contributor Author

picnixz commented Nov 12, 2024

Nevermind, there is a precedent: _PyErr_FormatFromCauseTstate. Let's have
_PyErr_SetLocaleStringTstate and _PyErr_SetLocaleString and both private.

@encukou
Copy link
Member

encukou commented Nov 12, 2024

How many places in the code would use this helper? I don't see the advantage over calling PyUnicode_DecodeLocale and PyErr_SetObject.

@picnixz
Copy link
Contributor Author

picnixz commented Nov 12, 2024

There are a few occurrences of dlerror() in _ctypes.c and callproc.c and dynload_shlib as well.

NVM: the dynload_shlib uses its own logic.

@encukou
Copy link
Member

encukou commented Nov 12, 2024

errors.c uses PyUnicode_DecodeLocale (with surrogateescape) on strerror()'s result. That looks like a good precedent.

I don't think it's worth it to add a project-wide helper here. Calling the 2 functions should be fine.

@ZeroIntensity
Copy link
Member

My main arguments for adding a new internal function would be that a) it helps remove the risk of reference leaks (which in my experience, are all too common when dealing with string mutations) and b) makes it easier to deal with locale in the future (we have incorrect uses of dlerror() internally that skip the locale check).

@encukou
Copy link
Member

encukou commented Nov 13, 2024

As far as I know, this is not a bug on any supported platforms, and so can't be tested in our CI. It's also not been reported/patched on any other platform I know of.

I don't know if it should be called a “bug” (that's more a question about the definition of the term), but I don't think a fix for this issue should be backported. Let's just improve main.

As for the helper: IMO, it's better to first fix the code, and then, if it makes sense, refactor to add a helper. In clear-cut cases you'd combine the steps in one commit, but let's not do that here.

@ZeroIntensity ZeroIntensity added type-feature A feature request or enhancement and removed type-bug An unexpected behavior, bug, or error labels Nov 13, 2024
@ZeroIntensity
Copy link
Member

As far as I know, this is not a bug on any supported platforms, and so can't be tested in our CI. It's also not been reported/patched on any other platform I know of.

FWIW, I did find BPO-41894 when looking through the places where we use PyUnicode_DecodeLocale.

@picnixz picnixz removed 3.12 bugs and security fixes 3.13 bugs and security fixes 3.14 new features, bugs and security fixes labels Nov 17, 2024
@serhiy-storchaka
Copy link
Member

I do not mind to add a private helper if it can be used in multiple files, but I am not sure about a public API. It is just a combination of two functions, and this pattern may be not so common.

Some examples in the PR looks wrong. Set an error with locale-decoded error message, then clear it and set an error with utf8-decoded error message -- this pattern looks very wrong.

@picnixz
Copy link
Contributor Author

picnixz commented Nov 29, 2024

Set an error with locale-decoded error message, then clear it and set an error with utf8-decoded error message -- this pattern looks very wrong

This pattern is actually a way to fallback to a generic message. Otherwise, there will be no error message at all, which can be more confusing IMO. We only clear the error if the decoding fails (for whatever reason). Such pattern already existed when handling dlerror().

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) topic-C-API type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests

4 participants