gh-126742: allow to use non-UTF8 exception messages #126746

picnixz · 2024-11-12T14:21:46Z

Issue: Allow to set non-UTF8 exception messages #126742

Python/errors.c

ZeroIntensity · 2024-11-12T14:31:16Z

Technically, since this is a bugfix, let's wait until #126555 has landed before we merge this, and then address all the uses of dlerror() in this PR.

picnixz · 2024-11-12T14:34:13Z

then address all the uses of dlerror() in this PR.

I actually thought of doing it two PRs. Technically, this specific PR can be merged without waiting for the other and then you can use this interface in the other (rather the other way around).

ZeroIntensity · 2024-11-12T14:52:20Z

Eh, I would rather not. Let's not overwhelm someone with thread states and the private API on their first PR :)

ZeroIntensity · 2024-11-13T11:53:04Z

I've removed backport labels, per what Petr said in the issue. I think he's right--let's just focus on main with this.

encukou · 2024-11-15T13:17:46Z

Hmm, just had an idea: if we add a “char* in locale encoding” format to PyUnicode_FromFormat, it'll be possible to use it with PyErr_Format -- but also for non-error messages.

ZeroIntensity · 2024-11-15T13:28:56Z

I like that idea! But we're exposing this as a public API--how often might users need this? (I've never needed to mess with locale in any of my C API usage, but to be fair I'm typically not calling APIs that return strings.)

picnixz · 2024-11-15T13:42:16Z

Hmm, just had an idea: if we add a “char* in locale encoding” format to

I'm not very fond of adding a case to additionally check when there's not a lot of use cases (unless proven otherwise). It shouldn't affect performances much (but this needs verification) but this would complicate an already complicate function I think (and people need to remember this new format as well).

ZeroIntensity · 2024-11-15T13:59:32Z

No no, I don't think he meant a variation of PyUnicode_FromFormat that takes a locale, he meant a format value %whatever that automatically calls PyUnicode_DecodeLocale or something like that.

picnixz · 2024-11-15T14:03:29Z

No no, I don't think he meant a variation of PyUnicode_FromFormat that takes a locale, he meant a format value %whatever that automatically calls PyUnicode_DecodeLocale or something like that.

And I understood. I don't think a new code should be added unless there is a real need.

ZeroIntensity · 2024-11-15T14:13:48Z

And I understood.

Ah! I couldn't tell, sorry.

A code search for PyUnicode_DecodeLocale is showing not too much real world usage--maybe that's a bad thing? Depending on how common it is for a non-UTF8 locale to break things, maybe it would be a good idea to add a format character just so people start handling it.

encukou · 2024-11-19T12:09:21Z

It could also show that we're mainly handling errors from the OS at one point -- the OSError constructor. That would be a good thing :)

Anyway, for this PR: @picnixz, do you want to add calls to the new helper(s)?

ZeroIntensity · 2024-11-19T12:29:06Z

FWIW, Bénédikt is on vacation right now. I don't think he'll be able to implement anything until next week.

vstinner

I don't see the purpose of these internal functions since they are now used. This PR only adds dead code. I would prefer to see how these functions are used.

If only dlerror() uses it, why not using code using dlerror()?

Include/internal/pycore_pyerrors.h

serhiy-storchaka

Translated error messages most likely depend on the LC_MESSAGE environment variable. It is a separate difficult issue if its encoding differs from LC_CTYPE's encoding.

But if it is non-ASCII because it includes file paths, we should use the FS encoding.

picnixz · 2024-12-02T17:18:39Z

It is a separate difficult issue if its encoding differs from LC_CTYPE's encoding.

What should we do for that? should we fix this in a follow-up PR?

Co-authored-by: Victor Stinner <vstinner@python.org>

Include/internal/pycore_pyerrors.h

vstinner · 2024-12-03T14:26:12Z

Translated error messages most likely depend on the LC_MESSAGE environment variable. It is a separate difficult issue if its encoding differs from LC_CTYPE's encoding.

IMO it's reasonable to expect LC_MESSAGE and LC_CTYPE locales to use the same encoding in this API. It's already an enhancement compared to the current code which always expect UTF-8.

encukou · 2024-12-04T13:25:45Z

Then, what should we do with the code that currently ignores them?

Raise the error from _PyErr_SetLocaleString.
I should have caught that when they were switched to "surrogateescape". Hopefully it's not too late now.

picnixz · 2024-12-05T12:58:38Z

Oh ok! I'll change it in a few hours (coming back home right now)

serhiy-storchaka

Could you please add tests?

picnixz · 2024-12-08T18:00:40Z

Could you please add tests?

Do you want tests for the additional messages or tests for the new function itself? For the first case, I don't really know how to do it since we need to check for the presence of the translation files but I can do it for the second.

serhiy-storchaka · 2024-12-08T18:28:24Z

Try to use Python API that calls corresponding functions (like dlopen() and gdbm_open()) in non-UTF-8 locale. Current code fails if translations are installed.

Try to use it with non-ASCII name. It seems that the dlopen() error message contains the file name -- current code fails trying to decode it with UTF-8. This will work even if no translations are installed.

picnixz · 2024-12-09T14:39:04Z

I won't be able to work for the rest of the week so I'll fix the CI failures this week-end-

vstinner · 2024-12-09T15:12:03Z

Lib/test/test_ctypes/test_dlerror.py

+    def test_localized_error(self):
+        with self.assertRaisesRegex(
+            OSError,
+            re.escape("foo.so: Ne peut ouvrir le fichier d'objet partagé"),


@serhiy-storchaka: We don't control dlerror() error message and translated error message. The test fails if translations are not installed or if the error message changes.

I'm not sure if it's a good idea to rely on the exact dlopen() translated error message.

We should not rely on the dlopen() error message. We only need to check that the error is not a UnicodeDecodeError.

Try to use a UTF8-undecodable bytes name, e.g. b'\xff' (bytes are accepted, right?).

serhiy-storchaka

I do not think that we need a C API test. It is just a private helper. In any case, the C API test is not working.

serhiy-storchaka · 2024-12-09T20:01:50Z

Lib/test/test_ctypes/test_dlerror.py

@@ -119,5 +123,17 @@ def test_null_dlsym(self):
            self.assertEqual(os.read(pipe_r, 2), b'OK')


+class TestCAPI(unittest.TestCase):


This is not a C API test.

serhiy-storchaka · 2024-12-09T20:05:43Z

Lib/test/test_ctypes/test_dlerror.py

+class TestCAPI(unittest.TestCase):
+
+    @unittest.skipUnless(hasattr(_ctypes, 'dlopen'), 'require ctypes.dlopen()')
+    @test.support.run_with_locales('LC_ALL', 'fr_FR.utf8', 'fr_FR.iso88591')


I think that run_with_locale() instead of run_with_locales() can be enough. But try to use non-UTF8 locales first. Try to use several locales, not only French. Add '' at the end to use the current locale as a fallback.

serhiy-storchaka · 2024-12-09T20:12:29Z

Lib/test/test_ctypes/test_dlerror.py

+    def test_localized_error(self):
+        with self.assertRaisesRegex(
+            OSError,
+            re.escape("foo.so: Ne peut ouvrir le fichier d'objet partagé"),


We should not rely on the dlopen() error message. We only need to check that the error is not a UnicodeDecodeError.

Try to use a UTF8-undecodable bytes name, e.g. b'\xff' (bytes are accepted, right?).

serhiy-storchaka · 2024-12-09T20:14:31Z

Lib/test/test_dbm_gnu.py

@@ -205,6 +207,11 @@ def test_clear(self):
                self.assertNotIn(k, db)
            self.assertEqual(len(db), 0)

+    @support.run_with_locales('LC_ALL', 'fr_FR.UTF-8', 'fr_FR.iso88591')
+    def test_localized_error(self):
+        expect = re.escape('Base de données vide')


We should not rely on the error message. We only need to check that the error is not a UnicodeDecodeError.

serhiy-storchaka · 2024-12-09T20:19:11Z

Lib/test/test_dbm_gnu.py

+    def test_localized_error(self):
+        expect = re.escape('Base de données vide')
+        empty = create_empty_file(filename)
+        self.assertRaisesRegex(gdbm.error, expect, gdbm.open, filename, 'r')


filename is TESTFN. It can be not encodable with the tested locale (we should try non-UTF8 locales first). Try to use ASCII-only name like "nonexisting". Try to use a UTF8-undecodable bytes name if bytes is accepted, otherwise use a name containing a surrogateescaped byte.

Unfortunately, non-existing ASCII name wouldn't work because the error would be an errno error instead of a gdbm.error. I really need a specific kind of error to trigger in order to hit the translated messages. But I can't create an empty file with UTF8-undecodable byte names I think (os.open() fails with a ValueError). So I'm not entirely sure how I can do it =/ I'll try to find other ways offline (I will be offline until Friday)

open(b'\xff')

How come create_empty_file failed when I tried? I'll need to check it later. But thanks! (I remember trying using the FN_NONDECODABLE file but didn't try with just '\xff').

Why not testing an existing empty file with an ASCII filename? Something like:

filename = "empty" open(filename, "wb").close() self.addCleanup(os_helper.unlink, filename) # ... use filename ...

allow to use translated exception messages

24eb521

bedevere-app bot added the awaiting review label Nov 12, 2024

bedevere-app bot mentioned this pull request Nov 12, 2024

Allow to set non-UTF8 exception messages #126742

Open

picnixz added the skip news label Nov 12, 2024

picnixz changed the title ~~gh-126742: allow to use translated exception messages~~ gh-126742: allow to use non-UTF8 exception messages Nov 12, 2024

picnixz requested review from encukou and ZeroIntensity November 12, 2024 14:25

ZeroIntensity reviewed Nov 12, 2024

View reviewed changes

Python/errors.c Outdated Show resolved Hide resolved

Python/errors.c Outdated Show resolved Hide resolved

ZeroIntensity added DO-NOT-MERGE needs backport to 3.12 bug and security fixes needs backport to 3.13 bugs and security fixes labels Nov 12, 2024

Use 'surrogateescape' handler instead of 'strict' handler.

4c0f85b

ZeroIntensity removed needs backport to 3.12 bug and security fixes needs backport to 3.13 bugs and security fixes DO-NOT-MERGE labels Nov 13, 2024

vstinner reviewed Nov 19, 2024

View reviewed changes

Include/internal/pycore_pyerrors.h Outdated Show resolved Hide resolved

encukou marked this pull request as draft November 20, 2024 09:46

bedevere-app bot removed the awaiting review label Nov 20, 2024

bedevere-app bot added the awaiting merge label Dec 2, 2024

serhiy-storchaka reviewed Dec 2, 2024

View reviewed changes

Update Include/internal/pycore_pyerrors.h

2633d18

Co-authored-by: Victor Stinner <vstinner@python.org>

picnixz commented Dec 2, 2024

View reviewed changes

Include/internal/pycore_pyerrors.h Outdated Show resolved Hide resolved

picnixz added 2 commits December 2, 2024 18:20

fix grammar

45442c3

Merge branch 'main' into fix/locale-set-object-exception-126742

6bedc4b

picnixz marked this pull request as draft December 8, 2024 13:31

bedevere-app bot removed the awaiting merge label Dec 8, 2024

Unconditionally re-raise decoding errors

6f9ee0b

picnixz force-pushed the fix/locale-set-object-exception-126742 branch from ad8ad6f to 6f9ee0b Compare December 8, 2024 13:40

picnixz marked this pull request as ready for review December 8, 2024 13:40

bedevere-app bot added the awaiting review label Dec 8, 2024

picnixz requested review from serhiy-storchaka, vstinner and ZeroIntensity December 8, 2024 13:40

serhiy-storchaka reviewed Dec 8, 2024

View reviewed changes

picnixz added 2 commits December 9, 2024 13:33

add tests for _PyErr_SetLocaleString

b02a715

add tests for dlerror and gdbm_* functions

47d50b8

picnixz requested a review from serhiy-storchaka December 9, 2024 14:03

picnixz marked this pull request as draft December 9, 2024 14:38

bedevere-app bot removed the awaiting review label Dec 9, 2024

vstinner reviewed Dec 9, 2024

View reviewed changes

serhiy-storchaka reviewed Dec 9, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gh-126742: allow to use non-UTF8 exception messages #126746

gh-126742: allow to use non-UTF8 exception messages #126746

picnixz commented Nov 12, 2024 •

edited by bedevere-app bot

Loading

ZeroIntensity commented Nov 12, 2024

picnixz commented Nov 12, 2024 •

edited

Loading

ZeroIntensity commented Nov 12, 2024

ZeroIntensity commented Nov 13, 2024

encukou commented Nov 15, 2024

ZeroIntensity commented Nov 15, 2024

picnixz commented Nov 15, 2024 •

edited

Loading

ZeroIntensity commented Nov 15, 2024

picnixz commented Nov 15, 2024

ZeroIntensity commented Nov 15, 2024

encukou commented Nov 19, 2024

ZeroIntensity commented Nov 19, 2024

vstinner left a comment

serhiy-storchaka left a comment

picnixz commented Dec 2, 2024

vstinner commented Dec 3, 2024 •

edited

Loading

encukou commented Dec 4, 2024 •

edited

Loading

picnixz commented Dec 5, 2024

serhiy-storchaka left a comment

picnixz commented Dec 8, 2024

serhiy-storchaka commented Dec 8, 2024

picnixz commented Dec 9, 2024

vstinner Dec 9, 2024

serhiy-storchaka Dec 9, 2024

serhiy-storchaka left a comment

serhiy-storchaka Dec 9, 2024

serhiy-storchaka Dec 9, 2024

serhiy-storchaka Dec 9, 2024

serhiy-storchaka Dec 9, 2024

serhiy-storchaka Dec 9, 2024

picnixz Dec 10, 2024

serhiy-storchaka Dec 10, 2024

picnixz Dec 10, 2024

vstinner Dec 10, 2024 •

edited

Loading

		@@ -119,5 +123,17 @@ def test_null_dlsym(self):
		self.assertEqual(os.read(pipe_r, 2), b'OK')


		class TestCAPI(unittest.TestCase):

gh-126742: allow to use non-UTF8 exception messages #126746

Are you sure you want to change the base?

gh-126742: allow to use non-UTF8 exception messages #126746

Conversation

picnixz commented Nov 12, 2024 • edited by bedevere-app bot Loading

ZeroIntensity commented Nov 12, 2024

picnixz commented Nov 12, 2024 • edited Loading

ZeroIntensity commented Nov 12, 2024

ZeroIntensity commented Nov 13, 2024

encukou commented Nov 15, 2024

ZeroIntensity commented Nov 15, 2024

picnixz commented Nov 15, 2024 • edited Loading

ZeroIntensity commented Nov 15, 2024

picnixz commented Nov 15, 2024

ZeroIntensity commented Nov 15, 2024

encukou commented Nov 19, 2024

ZeroIntensity commented Nov 19, 2024

vstinner left a comment

Choose a reason for hiding this comment

serhiy-storchaka left a comment

Choose a reason for hiding this comment

picnixz commented Dec 2, 2024

vstinner commented Dec 3, 2024 • edited Loading

encukou commented Dec 4, 2024 • edited Loading

picnixz commented Dec 5, 2024

serhiy-storchaka left a comment

Choose a reason for hiding this comment

picnixz commented Dec 8, 2024

serhiy-storchaka commented Dec 8, 2024

picnixz commented Dec 9, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

serhiy-storchaka left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vstinner Dec 10, 2024 • edited Loading

Choose a reason for hiding this comment

picnixz commented Nov 12, 2024 •

edited by bedevere-app bot

Loading

picnixz commented Nov 12, 2024 •

edited

Loading

picnixz commented Nov 15, 2024 •

edited

Loading

vstinner commented Dec 3, 2024 •

edited

Loading

encukou commented Dec 4, 2024 •

edited

Loading

vstinner Dec 10, 2024 •

edited

Loading