Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-126554: ctypes: Correctly handle NULL dlsym values #126555

Merged
merged 54 commits into from
Nov 15, 2024

Conversation

grgalex
Copy link
Contributor

@grgalex grgalex commented Nov 7, 2024

For dlsym(), a return value of NULL does not necessarily indicate an error [1].

Therefore, to avoid using stale (or NULL) dlerror() values, we must:

  1. clear the previous error state by calling dlerror()
  2. call dlsym()
  3. call dlerror()

If the return value of dlerror() is not NULL, an error occured.

In our case of ctypes, I understand that we subjectively treat a NULL return value from dlsym() as an error, so we must format a special string for that.

[1] https://man7.org/linux/man-pages/man3/dlsym.3.html

Copy link

cpython-cla-bot bot commented Nov 7, 2024

All commit authors signed the Contributor License Agreement.
CLA signed

@bedevere-app
Copy link

bedevere-app bot commented Nov 7, 2024

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

For dlsym(), a return value of NULL does not necessarily indicate
an error [1].

Therefore, to avoid using stale (or NULL) dlerror() values, we must:

 1. clear the previous error state by calling dlerror()
 2. call dlsym()
 3. call dlerror()

If the return value of dlerror() is not NULL, an error occured.

In our case, we also subjectively treat a NULL return value
from dlsym() as an error, so we format a special string for that.

[1]: https://man7.org/linux/man-pages/man3/dlsym.3.html

Signed-off-by: Georgios Alexopoulos <grgalex42@gmail.com>
@efimov-mikhail
Copy link
Contributor

On success, these functions return the address associated with
symbol. On failure, they return NULL; the cause of the error can
be diagnosed using dlerror(3).

This is direct quotation from the man page you linked.
Why you think this is correct?

dlsym(), a return value of NULL does not necessarily indicate an error [1].

@grgalex
Copy link
Contributor Author

grgalex commented Nov 7, 2024

@efimov-mikhail

Indeed, I failed to mention that at some point the man page contradicts itself, and I believe that the part you quoted is something the maintainer forgot to remove.

[edit: added this part]

I say this because further up, in the DESCRIPTION, the man page states:

In unusual cases (see NOTES) the value of the symbol could
actually be NULL.  Therefore, a NULL return from dlsym() need not
indicate an error.  The correct way to distinguish an error from
a symbol whose value is NULL is to call dlerror(3) to clear any
old error conditions, then call dlsym(), and then call dlerror(3)
again, saving its return value into a variable, and check whether
this saved value is not NULL.

[end of edit]

Additionally, further down, in the NOTES segment, the man page states:

There are several scenarios when the address of a global symbol
is NULL.  For example, a symbol can be placed at zero address by
the linker, via a linker script or with --defsym command-line
option.  Undefined weak symbols also have NULL value.  Finally,
the symbol value may be the result of a GNU indirect function
(IFUNC) resolver function that returns NULL as the resolved
value.  In the latter case, dlsym() also returns NULL without
error.  However, in the former two cases, the behavior of GNU
dynamic linker is inconsistent: relocation processing succeeds
and the symbol can be observed to have NULL value, but dlsym()
fails and dlerror() indicates a lookup error.

@efimov-mikhail
Copy link
Contributor

efimov-mikhail commented Nov 7, 2024

Understood. IMO, it would be better if we can make an actual reproducible test case, and put this case into regular testing procedure.

@grgalex
Copy link
Contributor Author

grgalex commented Nov 7, 2024

See reproducer here: #126554 (comment)

Copy link
Member

@ZeroIntensity ZeroIntensity left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's good to see new contributors :)

Unfortunately, this change does need a test. If it's really, really difficult to add one, then I guess we'll figure things out, but in general, it's difficult for us to accept even tiny bugfixes without proper tests.

For something like this, as long as the test works, it's probably OK. You could do something as hacky as hard-coding a byte string into a test file and unpacking that into a .so at runtime, and just disable all the systems that don't work. But it would be preferable if you could find a library that actually triggers this problem in the wild.

Modules/_ctypes/_ctypes.c Outdated Show resolved Hide resolved
Modules/_ctypes/_ctypes.c Outdated Show resolved Hide resolved
Modules/_ctypes/_ctypes.c Outdated Show resolved Hide resolved
Modules/_ctypes/_ctypes.c Outdated Show resolved Hide resolved
@ZeroIntensity ZeroIntensity added topic-ctypes needs backport to 3.12 bug and security fixes needs backport to 3.13 bugs and security fixes labels Nov 7, 2024
grgalex and others added 3 commits November 8, 2024 15:15
…2eb.rst

Co-authored-by: Peter Bierma <zintensitydev@gmail.com>
Co-authored-by: Peter Bierma <zintensitydev@gmail.com>
Co-authored-by: Peter Bierma <zintensitydev@gmail.com>
@grgalex
Copy link
Contributor Author

grgalex commented Nov 8, 2024

@ZeroIntensity

  1. I understand that I can just click "Commit" on your proposed changes, and then we can squash before merging
  2. With some grepping I found out that we probably also need to change
    ptr = dlsym((void*)handle, name);

    using the same approach.
  3. Regarding the test case, I'm thinking something similar to https://github.com/python/cpython/blob/b19d12f44735583e339ee6ee560588fcf22633b4/Lib/test/test_ctypes/test_find.py, where we will:
    1. Write a fixed string (the content of foo.c from the reproducer),
    2. Compile (as does the test_find testcase) into a shared lib
    3. Load, and access the foo object.

@grgalex
Copy link
Contributor Author

grgalex commented Nov 8, 2024

(2) is done, pushing a commit. I also added a comment that explicitly states why we call dlerror() before dlsym().

From a project maintaner's POV, if we have 3 instances of this phenomenon, do we add a comment disclaimer at every instance? (This is the approach I took, but I'm unsure what the best / most common practice is.)

@grgalex
Copy link
Contributor Author

grgalex commented Nov 8, 2024

Proceeding with the test case (3), will let you know when ready.

Signed-off-by: Georgios Alexopoulos <grgalex@ba.uoa.gr>
@ZeroIntensity
Copy link
Member

  • I understand that I can just click "Commit" on your proposed changes, and then we can squash before merging

Yup. In fact, that's the preferred workflow for us. Force pushing onto one commit is very much discouraged.

2. With some grepping I found out that we probably also need to change

That looks like a problem too, yeah. Thanks!

I'll create a separate issue if I can come up with a repro for the locale issue. (I'll CC you on it, if you want to author that PR as well.)

Signed-off-by: Georgios Alexopoulos <grgalex@ba.uoa.gr>
@grgalex
Copy link
Contributor Author

grgalex commented Nov 8, 2024

Added a commit with the test case under test_ctypes/test_dlerror. PTAL.

Running on the PR branch, it passes :)

On HEAD of main, it explodes (segfault).

Do we have any way of handling this gracefully (maybe running a subprocess with the interpreter?), or is it the correct way to go?

@efimov-mikhail
Copy link
Contributor

Do we have any way of handling this gracefully (maybe running a subprocess with the interpreter?)

Could you please explain, what exactly are you speaking about?
Gracefully handling what?

@grgalex
Copy link
Contributor Author

grgalex commented Nov 8, 2024

If we want to be pedantic, we must check all 3 code paths that invoke dlsym().

In this case, we only check one (PyCFuncPtr_FromDll).

On one hand, since our amendment is the same across all 3 funcs, we should theoretically be OK.

@grgalex
Copy link
Contributor Author

grgalex commented Nov 8, 2024

Do we have any way of handling this gracefully (maybe running a subprocess with the interpreter?)

Could you please explain, what exactly are you speaking about? Gracefully handling what?

@efimov-mikhail

If I run the test on the main branch, the Python interpreter gets a segmentation fault.

What I'm saying is that in a scenario where someone runs all tests, when the ./python -m test process encounters this test, it will exit with a segmentation fault and terminate the testing procedure. It won't mark the test as failing. It will explode.

Maybe I have misunderstood something regarding the philosophy of testing.

@ZeroIntensity
Copy link
Member

Run the test in a subprocess and check the return code.

@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by @encukou for commit 33cf8b7 🤖

If you want to schedule another build, you need to add the 🔨 test-with-buildbots label again.

@bedevere-bot bedevere-bot removed the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Nov 13, 2024
Copy link
Member

@encukou encukou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the buildbots like it, let's merge!
It's end of day for me, I'll get to it tomorrow. I pushed a refcounting fix directly to save time.

Thank you for the fix, and your patience, @grgalex

Copy link
Member

@encukou encukou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, not all buildbots liked it :(

Lib/test/test_ctypes/test_dlerror.py Outdated Show resolved Hide resolved
grgalex and others added 2 commits November 14, 2024 15:43
Co-authored-by: Petr Viktorin <encukou@gmail.com>
Signed-off-by: Georgios Alexopoulos <grgalex@ba.uoa.gr>
@ZeroIntensity
Copy link
Member

I'm not sure if the ASan failures are related.

Copy link
Contributor

@picnixz picnixz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One PEP-7 nitpick and two de-chop suggestions. I didn't write suggestions for the PyErr_SetString usages since I wasn't sure it fits on 80 chars.

Modules/_ctypes/callproc.c Outdated Show resolved Hide resolved
Modules/_ctypes/callproc.c Outdated Show resolved Hide resolved
Modules/_ctypes/_ctypes.c Outdated Show resolved Hide resolved
grgalex and others added 2 commits November 14, 2024 18:42
Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
Signed-off-by: Georgios Alexopoulos <grgalex@ba.uoa.gr>
@ZeroIntensity ZeroIntensity added the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Nov 14, 2024
@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by @ZeroIntensity for commit ab22349 🤖

If you want to schedule another build, you need to add the 🔨 test-with-buildbots label again.

@bedevere-bot bedevere-bot removed the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Nov 14, 2024
@ZeroIntensity
Copy link
Member

ZeroIntensity commented Nov 14, 2024

Let's try the build bots again and see if it holds up now.

@ZeroIntensity
Copy link
Member

ZeroIntensity commented Nov 14, 2024

Buildbots are all good, the one failure is a problem with the DRC API that got merged the other day, which is unrelated. I'll submit a (separate) patch to fix that. I think we're good to merge!

@encukou encukou merged commit 8717f79 into python:main Nov 15, 2024
114 of 115 checks passed
@miss-islington-app
Copy link

Thanks @grgalex for the PR, and @encukou for merging it 🌮🎉.. I'm working now to backport this PR to: 3.12, 3.13.
🐍🍒⛏🤖

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Nov 15, 2024
…-126555)

For dlsym(), a return value of NULL does not necessarily indicate
an error [1].

Therefore, to avoid using stale (or NULL) dlerror() values, we must:

 1. clear the previous error state by calling dlerror()
 2. call dlsym()
 3. call dlerror()

If the return value of dlerror() is not NULL, an error occured.

In ctypes we choose to treat a NULL return value from dlsym()
as a "not found" error. This is the same as the fallback
message we use on Windows, Cygwin or when getting/formatting
the error reason fails.

[1]: https://man7.org/linux/man-pages/man3/dlsym.3.html

(cherry picked from commit 8717f79)

Co-authored-by: George Alexopoulos <giorgosalexo0@gmail.com>
Signed-off-by: Georgios Alexopoulos <grgalex42@gmail.com>
Signed-off-by: Georgios Alexopoulos <grgalex@ba.uoa.gr>
Co-authored-by: Peter Bierma <zintensitydev@gmail.com>
Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
Co-authored-by: Petr Viktorin <encukou@gmail.com>
@miss-islington-app
Copy link

Sorry, @grgalex and @encukou, I could not cleanly backport this to 3.12 due to a conflict.
Please backport using cherry_picker on command line.

cherry_picker 8717f792f7cc343599dc60daf80630cceb66145b 3.12

@bedevere-app
Copy link

bedevere-app bot commented Nov 15, 2024

GH-126861 is a backport of this pull request to the 3.13 branch.

@bedevere-app bedevere-app bot removed the needs backport to 3.13 bugs and security fixes label Nov 15, 2024
@efimov-mikhail
Copy link
Contributor

@grgalex, you've finished it! Congrats with your first contribution to CPython!

@grgalex
Copy link
Contributor Author

grgalex commented Nov 15, 2024

Thanks for the guidance @efimov-mikhail, @ZeroIntensity, @encukou.

Looking forward to collaborating again in the future!

ambv pushed a commit that referenced this pull request Nov 17, 2024
…) (#126861)

For dlsym(), a return value of NULL does not necessarily indicate
an error [1].

Therefore, to avoid using stale (or NULL) dlerror() values, we must:

 1. clear the previous error state by calling dlerror()
 2. call dlsym()
 3. call dlerror()

If the return value of dlerror() is not NULL, an error occured.

In ctypes we choose to treat a NULL return value from dlsym()
as a "not found" error. This is the same as the fallback
message we use on Windows, Cygwin or when getting/formatting
the error reason fails.

[1]: https://man7.org/linux/man-pages/man3/dlsym.3.html

(cherry picked from commit 8717f79)

Signed-off-by: Georgios Alexopoulos <grgalex@ba.uoa.gr>
Co-authored-by: George Alexopoulos <giorgosalexo0@gmail.com>
Co-authored-by: Peter Bierma <zintensitydev@gmail.com>
Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
Co-authored-by: Petr Viktorin <encukou@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants