gh-126554: ctypes: Correctly handle NULL dlsym values #126555

grgalex · 2024-11-07T20:20:48Z

For dlsym(), a return value of NULL does not necessarily indicate an error [1].

Therefore, to avoid using stale (or NULL) dlerror() values, we must:

clear the previous error state by calling dlerror()
call dlsym()
call dlerror()

If the return value of dlerror() is not NULL, an error occured.

In our case of ctypes, I understand that we subjectively treat a NULL return value from dlsym() as an error, so we must format a special string for that.

[1] https://man7.org/linux/man-pages/man3/dlsym.3.html

Issue: ctypes: We do not correctly handle NULL dlsym() return values #126554

ghost · 2024-11-07T20:20:52Z

All commit authors signed the Contributor License Agreement.

bedevere-app · 2024-11-07T20:20:52Z

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

For dlsym(), a return value of NULL does not necessarily indicate an error [1]. Therefore, to avoid using stale (or NULL) dlerror() values, we must: 1. clear the previous error state by calling dlerror() 2. call dlsym() 3. call dlerror() If the return value of dlerror() is not NULL, an error occured. In our case, we also subjectively treat a NULL return value from dlsym() as an error, so we format a special string for that. [1]: https://man7.org/linux/man-pages/man3/dlsym.3.html Signed-off-by: Georgios Alexopoulos <grgalex42@gmail.com>

efimov-mikhail · 2024-11-07T20:43:58Z

On success, these functions return the address associated with
symbol. On failure, they return NULL; the cause of the error can
be diagnosed using dlerror(3).

This is direct quotation from the man page you linked.
Why you think this is correct?

dlsym(), a return value of NULL does not necessarily indicate an error [1].

grgalex · 2024-11-07T20:47:47Z

@efimov-mikhail

Indeed, I failed to mention that at some point the man page contradicts itself, and I believe that the part you quoted is something the maintainer forgot to remove.

[edit: added this part]

I say this because further up, in the DESCRIPTION, the man page states:

In unusual cases (see NOTES) the value of the symbol could
actually be NULL.  Therefore, a NULL return from dlsym() need not
indicate an error.  The correct way to distinguish an error from
a symbol whose value is NULL is to call dlerror(3) to clear any
old error conditions, then call dlsym(), and then call dlerror(3)
again, saving its return value into a variable, and check whether
this saved value is not NULL.

[end of edit]

Additionally, further down, in the NOTES segment, the man page states:

There are several scenarios when the address of a global symbol
is NULL.  For example, a symbol can be placed at zero address by
the linker, via a linker script or with --defsym command-line
option.  Undefined weak symbols also have NULL value.  Finally,
the symbol value may be the result of a GNU indirect function
(IFUNC) resolver function that returns NULL as the resolved
value.  In the latter case, dlsym() also returns NULL without
error.  However, in the former two cases, the behavior of GNU
dynamic linker is inconsistent: relocation processing succeeds
and the symbol can be observed to have NULL value, but dlsym()
fails and dlerror() indicates a lookup error.

efimov-mikhail · 2024-11-07T20:52:03Z

Understood. IMO, it would be better if we can make an actual reproducible test case, and put this case into regular testing procedure.

grgalex · 2024-11-07T21:46:00Z

See reproducer here: #126554 (comment)

ZeroIntensity

It's good to see new contributors :)

Unfortunately, this change does need a test. If it's really, really difficult to add one, then I guess we'll figure things out, but in general, it's difficult for us to accept even tiny bugfixes without proper tests.

For something like this, as long as the test works, it's probably OK. You could do something as hacky as hard-coding a byte string into a test file and unpacking that into a .so at runtime, and just disable all the systems that don't work. But it would be preferable if you could find a library that actually triggers this problem in the wild.

Misc/NEWS.d/next/C_API/2024-11-07-20-24-58.gh-issue-126554.ri12eb.rst

Modules/_ctypes/_ctypes.c

…2eb.rst Co-authored-by: Peter Bierma <zintensitydev@gmail.com>

Co-authored-by: Peter Bierma <zintensitydev@gmail.com>

grgalex · 2024-11-08T13:32:33Z

@ZeroIntensity

I understand that I can just click "Commit" on your proposed changes, and then we can squash before merging
With some grepping I found out that we probably also need to change

cpython/Modules/_ctypes/callproc.c

Line 1626 in b19d12f

ptr = dlsym((void*)handle, name);

using the same approach.
Regarding the test case, I'm thinking something similar to https://github.com/python/cpython/blob/b19d12f44735583e339ee6ee560588fcf22633b4/Lib/test/test_ctypes/test_find.py, where we will:
1. Write a fixed string (the content of foo.c from the reproducer),
2. Compile (as does the test_find testcase) into a shared lib
3. Load, and access the foo object.

grgalex · 2024-11-08T13:50:44Z

(2) is done, pushing a commit. I also added a comment that explicitly states why we call dlerror() before dlsym().

From a project maintaner's POV, if we have 3 instances of this phenomenon, do we add a comment disclaimer at every instance? (This is the approach I took, but I'm unsure what the best / most common practice is.)

grgalex · 2024-11-08T13:51:18Z

Proceeding with the test case (3), will let you know when ready.

Signed-off-by: Georgios Alexopoulos <grgalex@ba.uoa.gr>

ZeroIntensity · 2024-11-08T13:56:17Z

I understand that I can just click "Commit" on your proposed changes, and then we can squash before merging

Yup. In fact, that's the preferred workflow for us. Force pushing onto one commit is very much discouraged.

2. With some grepping I found out that we probably also need to change

That looks like a problem too, yeah. Thanks!

I'll create a separate issue if I can come up with a repro for the locale issue. (I'll CC you on it, if you want to author that PR as well.)

Signed-off-by: Georgios Alexopoulos <grgalex@ba.uoa.gr>

Lib/test/test_ctypes/test_dlerror.py

grgalex · 2024-11-08T14:30:47Z

Added a commit with the test case under test_ctypes/test_dlerror. PTAL.

Running on the PR branch, it passes :)

On HEAD of main, it explodes (segfault).

Do we have any way of handling this gracefully (maybe running a subprocess with the interpreter?), or is it the correct way to go?

efimov-mikhail · 2024-11-08T14:36:43Z

Do we have any way of handling this gracefully (maybe running a subprocess with the interpreter?)

Could you please explain, what exactly are you speaking about?
Gracefully handling what?

grgalex · 2024-11-08T14:37:59Z

If we want to be pedantic, we must check all 3 code paths that invoke dlsym().

In this case, we only check one (PyCFuncPtr_FromDll).

On one hand, since our amendment is the same across all 3 funcs, we should theoretically be OK.

grgalex · 2024-11-08T14:40:20Z

Do we have any way of handling this gracefully (maybe running a subprocess with the interpreter?)

Could you please explain, what exactly are you speaking about? Gracefully handling what?

@efimov-mikhail

If I run the test on the main branch, the Python interpreter gets a segmentation fault.

What I'm saying is that in a scenario where someone runs all tests, when the ./python -m test process encounters this test, it will exit with a segmentation fault and terminate the testing procedure. It won't mark the test as failing. It will explode.

Maybe I have misunderstood something regarding the philosophy of testing.

ZeroIntensity · 2024-11-08T14:42:48Z

Run the test in a subprocess and check the return code.

Lib/test/test_ctypes/test_dlerror.py

Co-authored-by: Petr Viktorin <encukou@gmail.com>

Signed-off-by: Georgios Alexopoulos <grgalex@ba.uoa.gr>

ZeroIntensity · 2024-11-14T14:10:47Z

I'm not sure if the ASan failures are related.

picnixz

One PEP-7 nitpick and two de-chop suggestions. I didn't write suggestions for the PyErr_SetString usages since I wasn't sure it fits on 80 chars.

Modules/_ctypes/callproc.c

Modules/_ctypes/_ctypes.c

Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>

Signed-off-by: Georgios Alexopoulos <grgalex@ba.uoa.gr>

bedevere-bot · 2024-11-14T17:12:23Z

🤖 New build scheduled with the buildbot fleet by @ZeroIntensity for commit ab22349 🤖

If you want to schedule another build, you need to add the 🔨 test-with-buildbots label again.

ZeroIntensity · 2024-11-14T17:12:42Z

Let's try the build bots again and see if it holds up now.

ZeroIntensity · 2024-11-14T23:47:42Z

Buildbots are all good, the one failure is a problem with the DRC API that got merged the other day, which is unrelated. I'll submit a (separate) patch to fix that. I think we're good to merge!

miss-islington-app · 2024-11-15T10:05:54Z

Thanks @grgalex for the PR, and @encukou for merging it 🌮🎉.. I'm working now to backport this PR to: 3.12, 3.13.
🐍🍒⛏🤖

…-126555) For dlsym(), a return value of NULL does not necessarily indicate an error [1]. Therefore, to avoid using stale (or NULL) dlerror() values, we must: 1. clear the previous error state by calling dlerror() 2. call dlsym() 3. call dlerror() If the return value of dlerror() is not NULL, an error occured. In ctypes we choose to treat a NULL return value from dlsym() as a "not found" error. This is the same as the fallback message we use on Windows, Cygwin or when getting/formatting the error reason fails. [1]: https://man7.org/linux/man-pages/man3/dlsym.3.html (cherry picked from commit 8717f79) Co-authored-by: George Alexopoulos <giorgosalexo0@gmail.com> Signed-off-by: Georgios Alexopoulos <grgalex42@gmail.com> Signed-off-by: Georgios Alexopoulos <grgalex@ba.uoa.gr> Co-authored-by: Peter Bierma <zintensitydev@gmail.com> Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com> Co-authored-by: Petr Viktorin <encukou@gmail.com>

miss-islington-app · 2024-11-15T10:06:14Z

Sorry, @grgalex and @encukou, I could not cleanly backport this to 3.12 due to a conflict.
Please backport using cherry_picker on command line.

cherry_picker 8717f792f7cc343599dc60daf80630cceb66145b 3.12

bedevere-app · 2024-11-15T10:06:14Z

GH-126861 is a backport of this pull request to the 3.13 branch.

efimov-mikhail · 2024-11-15T10:16:55Z

@grgalex, you've finished it! Congrats with your first contribution to CPython!

grgalex · 2024-11-15T13:42:27Z

Thanks for the guidance @efimov-mikhail, @ZeroIntensity, @encukou.

Looking forward to collaborating again in the future!

…) (#126861) For dlsym(), a return value of NULL does not necessarily indicate an error [1]. Therefore, to avoid using stale (or NULL) dlerror() values, we must: 1. clear the previous error state by calling dlerror() 2. call dlsym() 3. call dlerror() If the return value of dlerror() is not NULL, an error occured. In ctypes we choose to treat a NULL return value from dlsym() as a "not found" error. This is the same as the fallback message we use on Windows, Cygwin or when getting/formatting the error reason fails. [1]: https://man7.org/linux/man-pages/man3/dlsym.3.html (cherry picked from commit 8717f79) Signed-off-by: Georgios Alexopoulos <grgalex@ba.uoa.gr> Co-authored-by: George Alexopoulos <giorgosalexo0@gmail.com> Co-authored-by: Peter Bierma <zintensitydev@gmail.com> Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com> Co-authored-by: Petr Viktorin <encukou@gmail.com>

…-126555) For dlsym(), a return value of NULL does not necessarily indicate an error [1]. Therefore, to avoid using stale (or NULL) dlerror() values, we must: 1. clear the previous error state by calling dlerror() 2. call dlsym() 3. call dlerror() If the return value of dlerror() is not NULL, an error occured. In ctypes we choose to treat a NULL return value from dlsym() as a "not found" error. This is the same as the fallback message we use on Windows, Cygwin or when getting/formatting the error reason fails. [1]: https://man7.org/linux/man-pages/man3/dlsym.3.html Signed-off-by: Georgios Alexopoulos <grgalex42@gmail.com> Signed-off-by: Georgios Alexopoulos <grgalex@ba.uoa.gr> Co-authored-by: Peter Bierma <zintensitydev@gmail.com> Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com> Co-authored-by: Petr Viktorin <encukou@gmail.com>

picnixz · 2024-12-08T13:44:05Z

3.13 backports were done but not 3.12; @encukou Do you want the 3.12 backport in the end or can we close the issue with only 3.13 and 3.14 being affected?

encukou · 2024-12-09T12:36:17Z

As I said, I'm fine not backporting. If anyone disagrees, open the PR :)

bedevere-app · 2024-12-09T14:16:09Z

GH-127764 is a backport of this pull request to the 3.12 branch.

…) (GH-127764) For dlsym(), a return value of NULL does not necessarily indicate an error [1]. Therefore, to avoid using stale (or NULL) dlerror() values, we must: 1. clear the previous error state by calling dlerror() 2. call dlsym() 3. call dlerror() If the return value of dlerror() is not NULL, an error occured. In ctypes we choose to treat a NULL return value from dlsym() as a "not found" error. This is the same as the fallback message we use on Windows, Cygwin or when getting/formatting the error reason fails. [1]: https://man7.org/linux/man-pages/man3/dlsym.3.html Signed-off-by: Georgios Alexopoulos <grgalex42@gmail.com> Signed-off-by: Georgios Alexopoulos <grgalex@ba.uoa.gr> Co-authored-by: George Alexopoulos <giorgosalexo0@gmail.com> Co-authored-by: Peter Bierma <zintensitydev@gmail.com> Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com> Co-authored-by: Petr Viktorin <encukou@gmail.com>

…-126555) For dlsym(), a return value of NULL does not necessarily indicate an error [1]. Therefore, to avoid using stale (or NULL) dlerror() values, we must: 1. clear the previous error state by calling dlerror() 2. call dlsym() 3. call dlerror() If the return value of dlerror() is not NULL, an error occured. In ctypes we choose to treat a NULL return value from dlsym() as a "not found" error. This is the same as the fallback message we use on Windows, Cygwin or when getting/formatting the error reason fails. [1]: https://man7.org/linux/man-pages/man3/dlsym.3.html Signed-off-by: Georgios Alexopoulos <grgalex42@gmail.com> Signed-off-by: Georgios Alexopoulos <grgalex@ba.uoa.gr> Co-authored-by: Peter Bierma <zintensitydev@gmail.com> Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com> Co-authored-by: Petr Viktorin <encukou@gmail.com>

bedevere-app bot added the awaiting review label Nov 7, 2024

bedevere-app bot mentioned this pull request Nov 7, 2024

ctypes: We do not correctly handle NULL dlsym() return values #126554

Closed

grgalex force-pushed the fix-issue-126554 branch from b436c5f to bc3ce76 Compare November 7, 2024 20:31

grgalex force-pushed the fix-issue-126554 branch from bc3ce76 to 3c3c29d Compare November 7, 2024 20:41

ZeroIntensity reviewed Nov 7, 2024

View reviewed changes

ZeroIntensity added topic-ctypes needs backport to 3.12 only security fixes needs backport to 3.13 bugs and security fixes labels Nov 7, 2024

grgalex and others added 3 commits November 8, 2024 15:15

Update Misc/NEWS.d/next/C_API/2024-11-07-20-24-58.gh-issue-126554.ri1…

f71abe8

…2eb.rst Co-authored-by: Peter Bierma <zintensitydev@gmail.com>

Update Modules/_ctypes/_ctypes.c

64a7394

Co-authored-by: Peter Bierma <zintensitydev@gmail.com>

Update Modules/_ctypes/_ctypes.c

512a084

Co-authored-by: Peter Bierma <zintensitydev@gmail.com>

Add dlerror() comment, handle callproc.c

cd7d5d7

Signed-off-by: Georgios Alexopoulos <grgalex@ba.uoa.gr>

test: Add ctypes null dlsym case

cf39b5b

Signed-off-by: Georgios Alexopoulos <grgalex@ba.uoa.gr>

ZeroIntensity reviewed Nov 8, 2024

View reviewed changes

Lib/test/test_ctypes/test_dlerror.py Show resolved Hide resolved

encukou reviewed Nov 14, 2024

View reviewed changes

Lib/test/test_ctypes/test_dlerror.py Outdated Show resolved Hide resolved

grgalex and others added 2 commits November 14, 2024 15:43

Update Lib/test/test_ctypes/test_dlerror.py

cda7771

Co-authored-by: Petr Viktorin <encukou@gmail.com>

test: Handle different architectures, where IFUNC not supported

4b1bb47

Signed-off-by: Georgios Alexopoulos <grgalex@ba.uoa.gr>

picnixz approved these changes Nov 14, 2024

View reviewed changes

Modules/_ctypes/callproc.c Outdated Show resolved Hide resolved

Modules/_ctypes/callproc.c Outdated Show resolved Hide resolved

Modules/_ctypes/_ctypes.c Outdated Show resolved Hide resolved

grgalex and others added 2 commits November 14, 2024 18:42

Update Modules/_ctypes/callproc.c

2290aa4

Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>

Don't break when under 80 chars

ab22349

Signed-off-by: Georgios Alexopoulos <grgalex@ba.uoa.gr>

ZeroIntensity added the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Nov 14, 2024

bedevere-bot removed the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Nov 14, 2024

encukou merged commit 8717f79 into python:main Nov 15, 2024
114 of 115 checks passed

bedevere-app bot removed the awaiting merge label Nov 15, 2024

miss-islington-app bot assigned encukou Nov 15, 2024

bedevere-app bot removed the needs backport to 3.13 bugs and security fixes label Nov 15, 2024

bedevere-app bot removed the needs backport to 3.12 only security fixes label Dec 9, 2024

Uh oh!

gh-126554: ctypes: Correctly handle NULL dlsym values #126555

gh-126554: ctypes: Correctly handle NULL dlsym values #126555

Uh oh!

Conversation

grgalex commented Nov 7, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ghost commented Nov 7, 2024 • edited by ghost Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bedevere-app bot commented Nov 7, 2024

Uh oh!

efimov-mikhail commented Nov 7, 2024

Uh oh!

grgalex commented Nov 7, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

efimov-mikhail commented Nov 7, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

grgalex commented Nov 7, 2024

Uh oh!

ZeroIntensity left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

grgalex commented Nov 8, 2024

Uh oh!

grgalex commented Nov 8, 2024

Uh oh!

grgalex commented Nov 8, 2024

Uh oh!

ZeroIntensity commented Nov 8, 2024

Uh oh!

Uh oh!

grgalex commented Nov 8, 2024

Uh oh!

efimov-mikhail commented Nov 8, 2024

Uh oh!

grgalex commented Nov 8, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

grgalex commented Nov 8, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ZeroIntensity commented Nov 8, 2024

Uh oh!

Uh oh!

ZeroIntensity commented Nov 14, 2024

Uh oh!

picnixz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bedevere-bot commented Nov 14, 2024

Uh oh!

ZeroIntensity commented Nov 14, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ZeroIntensity commented Nov 14, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

miss-islington-app bot commented Nov 15, 2024

Uh oh!

miss-islington-app bot commented Nov 15, 2024

Uh oh!

bedevere-app bot commented Nov 15, 2024

Uh oh!

efimov-mikhail commented Nov 15, 2024

Uh oh!

grgalex commented Nov 15, 2024

Uh oh!

grgalex commented Nov 7, 2024 •

edited

Loading

ghost commented Nov 7, 2024 •

edited by ghost

Loading

grgalex commented Nov 7, 2024 •

edited

Loading

efimov-mikhail commented Nov 7, 2024 •

edited

Loading

grgalex commented Nov 8, 2024 •

edited

Loading

grgalex commented Nov 8, 2024 •

edited

Loading

ZeroIntensity commented Nov 14, 2024 •

edited

Loading

ZeroIntensity commented Nov 14, 2024 •

edited

Loading