-
-
Notifications
You must be signed in to change notification settings - Fork 30.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-126554: ctypes: Correctly handle NULL dlsym values #126555
Conversation
Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool. If this change has little impact on Python users, wait for a maintainer to apply the |
b436c5f
to
bc3ce76
Compare
For dlsym(), a return value of NULL does not necessarily indicate an error [1]. Therefore, to avoid using stale (or NULL) dlerror() values, we must: 1. clear the previous error state by calling dlerror() 2. call dlsym() 3. call dlerror() If the return value of dlerror() is not NULL, an error occured. In our case, we also subjectively treat a NULL return value from dlsym() as an error, so we format a special string for that. [1]: https://man7.org/linux/man-pages/man3/dlsym.3.html Signed-off-by: Georgios Alexopoulos <grgalex42@gmail.com>
bc3ce76
to
3c3c29d
Compare
This is direct quotation from the man page you linked.
|
Indeed, I failed to mention that at some point the man page contradicts itself, and I believe that the part you quoted is something the maintainer forgot to remove.
I say this because further up, in the DESCRIPTION, the man page states:
Additionally, further down, in the NOTES segment, the man page states:
|
Understood. IMO, it would be better if we can make an actual reproducible test case, and put this case into regular testing procedure. |
See reproducer here: #126554 (comment) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's good to see new contributors :)
Unfortunately, this change does need a test. If it's really, really difficult to add one, then I guess we'll figure things out, but in general, it's difficult for us to accept even tiny bugfixes without proper tests.
For something like this, as long as the test works, it's probably OK. You could do something as hacky as hard-coding a byte string into a test file and unpacking that into a .so
at runtime, and just disable all the systems that don't work. But it would be preferable if you could find a library that actually triggers this problem in the wild.
Misc/NEWS.d/next/C_API/2024-11-07-20-24-58.gh-issue-126554.ri12eb.rst
Outdated
Show resolved
Hide resolved
…2eb.rst Co-authored-by: Peter Bierma <zintensitydev@gmail.com>
Co-authored-by: Peter Bierma <zintensitydev@gmail.com>
Co-authored-by: Peter Bierma <zintensitydev@gmail.com>
|
(2) is done, pushing a commit. I also added a comment that explicitly states why we call From a project maintaner's POV, if we have 3 instances of this phenomenon, do we add a comment disclaimer at every instance? (This is the approach I took, but I'm unsure what the best / most common practice is.) |
Proceeding with the test case (3), will let you know when ready. |
Signed-off-by: Georgios Alexopoulos <grgalex@ba.uoa.gr>
Yup. In fact, that's the preferred workflow for us. Force pushing onto one commit is very much discouraged.
That looks like a problem too, yeah. Thanks! I'll create a separate issue if I can come up with a repro for the locale issue. (I'll CC you on it, if you want to author that PR as well.) |
Signed-off-by: Georgios Alexopoulos <grgalex@ba.uoa.gr>
Added a commit with the test case under Running on the PR branch, it passes :) On HEAD of main, it explodes (segfault). Do we have any way of handling this gracefully (maybe running a subprocess with the interpreter?), or is it the correct way to go? |
Could you please explain, what exactly are you speaking about? |
If we want to be pedantic, we must check all 3 code paths that invoke In this case, we only check one ( On one hand, since our amendment is the same across all 3 funcs, we should theoretically be OK. |
If I run the test on the main branch, the Python interpreter gets a segmentation fault. What I'm saying is that in a scenario where someone runs all tests, when the Maybe I have misunderstood something regarding the philosophy of testing. |
Run the test in a subprocess and check the return code. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the buildbots like it, let's merge!
It's end of day for me, I'll get to it tomorrow. I pushed a refcounting fix directly to save time.
Thank you for the fix, and your patience, @grgalex
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, not all buildbots liked it :(
Co-authored-by: Petr Viktorin <encukou@gmail.com>
Signed-off-by: Georgios Alexopoulos <grgalex@ba.uoa.gr>
I'm not sure if the ASan failures are related. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One PEP-7 nitpick and two de-chop suggestions. I didn't write suggestions for the PyErr_SetString
usages since I wasn't sure it fits on 80 chars.
Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
Signed-off-by: Georgios Alexopoulos <grgalex@ba.uoa.gr>
🤖 New build scheduled with the buildbot fleet by @ZeroIntensity for commit ab22349 🤖 If you want to schedule another build, you need to add the 🔨 test-with-buildbots label again. |
Let's try the build bots again and see if it holds up now. |
Buildbots are all good, the one failure is a problem with the DRC API that got merged the other day, which is unrelated. I'll submit a (separate) patch to fix that. I think we're good to merge! |
…-126555) For dlsym(), a return value of NULL does not necessarily indicate an error [1]. Therefore, to avoid using stale (or NULL) dlerror() values, we must: 1. clear the previous error state by calling dlerror() 2. call dlsym() 3. call dlerror() If the return value of dlerror() is not NULL, an error occured. In ctypes we choose to treat a NULL return value from dlsym() as a "not found" error. This is the same as the fallback message we use on Windows, Cygwin or when getting/formatting the error reason fails. [1]: https://man7.org/linux/man-pages/man3/dlsym.3.html (cherry picked from commit 8717f79) Co-authored-by: George Alexopoulos <giorgosalexo0@gmail.com> Signed-off-by: Georgios Alexopoulos <grgalex42@gmail.com> Signed-off-by: Georgios Alexopoulos <grgalex@ba.uoa.gr> Co-authored-by: Peter Bierma <zintensitydev@gmail.com> Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com> Co-authored-by: Petr Viktorin <encukou@gmail.com>
Sorry, @grgalex and @encukou, I could not cleanly backport this to
|
GH-126861 is a backport of this pull request to the 3.13 branch. |
@grgalex, you've finished it! Congrats with your first contribution to CPython! |
Thanks for the guidance @efimov-mikhail, @ZeroIntensity, @encukou. Looking forward to collaborating again in the future! |
…) (#126861) For dlsym(), a return value of NULL does not necessarily indicate an error [1]. Therefore, to avoid using stale (or NULL) dlerror() values, we must: 1. clear the previous error state by calling dlerror() 2. call dlsym() 3. call dlerror() If the return value of dlerror() is not NULL, an error occured. In ctypes we choose to treat a NULL return value from dlsym() as a "not found" error. This is the same as the fallback message we use on Windows, Cygwin or when getting/formatting the error reason fails. [1]: https://man7.org/linux/man-pages/man3/dlsym.3.html (cherry picked from commit 8717f79) Signed-off-by: Georgios Alexopoulos <grgalex@ba.uoa.gr> Co-authored-by: George Alexopoulos <giorgosalexo0@gmail.com> Co-authored-by: Peter Bierma <zintensitydev@gmail.com> Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com> Co-authored-by: Petr Viktorin <encukou@gmail.com>
For dlsym(), a return value of NULL does not necessarily indicate an error [1].
Therefore, to avoid using stale (or NULL) dlerror() values, we must:
If the return value of dlerror() is not NULL, an error occured.
In our case of ctypes, I understand that we subjectively treat a NULL return value from dlsym() as an error, so we must format a special string for that.
[1] https://man7.org/linux/man-pages/man3/dlsym.3.html