gh-99108: Release the GIL around hashlib built-in computation #104675

gpshead · 2023-05-20T00:04:15Z

This matches the GIL releasing behavior of our existing _hashopenssl
module, extending it to the HACL* built-ins.

Issue: Replace built-in hashlib with verified implementations from HACL* #99108

This matches the GIL releasing behavior of our existing `_hashopenssl` module, extending it to the HACL* built-ins.

gpshead · 2023-05-20T00:09:02Z

@msprotz - does this make sense to you?

msprotz · 2023-05-20T03:53:02Z

Do you have a reference for how the Python GC works? I'm reading https://wiki.python.org/moin/GlobalInterpreterLock, https://docs.python.org/3/c-api/init.html#thread-state-and-the-global-interpreter-lock, and https://docs.python.org/3/c-api/memory.html -- is there any other good resource?

Basically I'd like to understand if:

a compaction can be triggered while the C code no longer has the GIL (I assume not, but it'd be good to confirm), and
if the C code needs to increase the refcount of e.g. the input data (I also assume not, since the callstack owns the object somewhere, its refcount must remain > 1, but also would be good to confirm).

I'd like to read up a little bit before giving you a thumbs-up. Thanks!

gpshead · 2023-05-20T17:12:33Z

CPython is purely references counted with no compaction or moving of objects in memory. The GC exists solely to deal with reference cycles. It never rearranges memory and never frees memory behind any object with a non-zero reference count. We've got a writeup on that at https://devguide.python.org/internals/garbage-collector/. It has been this way since Python 2.0 when the cyclic GC was introduced. (before that, reference cycles were memory leaks)

No memory returned from Python C APIs will ever be moved or freed so long as it belongs to referenced objects. The PyBytes objects the hash functions receive always have positive refcounts by definition, thus the PyBytes_AsStringAndSize() returned pointer is safely passed synchronously to other C code with GIL released as our thread owns that immutable object.

This PR applies identical logic to what _hashopenssl has done for a very long time to release the GIL. The lock per hash object is added to avoid code being able to call into the C hash state mutation APIs on a given instance from multiple threads at once. (It'd be clearly buggy code design if anyone ever did - our goal is just to avoid undefined behavior of C API misuse should anyone ever try)

msprotz

Ok that makes a lock more sense and now I understand the purpose of the lock. Thanks for the pointers!

Overall I think it would be helpful to document the intent behind this locking behavior, notably:

what you said about avoiding locking up the CPU in case there's lots of data to hash
the defensive lock that isn't strictly required but helps guard against rogue C API clients
the lazy initialization and the fact that the module can't assume obj->lock is non-NULL (may be uninitialized or, as I understand it, lock creation may have failed).

Other than that, as far as I can tell, this looks fine. Thanks!

Modules/md5module.c

gpshead · 2023-05-22T22:08:03Z

Agreed, I'll work on some common code comments to apply to everywhere relevant about these patterns.

Modules/_hashopenssl.c

miss-islington · 2023-05-23T00:06:45Z

Thanks @gpshead for the PR 🌮🎉.. I'm working now to backport this PR to: 3.12.
🐍🍒⛏🤖

…ythonGH-104675) This matches the GIL releasing behavior of our existing `_hashopenssl` module, extending it to the HACL* built-ins. Includes adding comments to better describe the ENTER/LEAVE macros purpose and explain the lock strategy in both existing and new code. (cherry picked from commit 2e5d8a9) Co-authored-by: Gregory P. Smith <greg@krypto.org>

bedevere-bot · 2023-05-23T00:07:17Z

GH-104776 is a backport of this pull request to the 3.12 branch.

…H-104675) (#104776) gh-99108: Release the GIL around hashlib built-in computation (GH-104675) This matches the GIL releasing behavior of our existing `_hashopenssl` module, extending it to the HACL* built-ins. Includes adding comments to better describe the ENTER/LEAVE macros purpose and explain the lock strategy in both existing and new code. (cherry picked from commit 2e5d8a9) Co-authored-by: Gregory P. Smith [Google] <greg@krypto.org>

gpshead added 2 commits May 19, 2023 19:46

pythongh-99108: Release the GIL around hashlib built-in hash updates.

73094a1

This matches the GIL releasing behavior of our existing `_hashopenssl` module, extending it to the HACL* built-ins.

Do the same for sha2, sha1, & md5.

621f9a6

gpshead self-assigned this May 20, 2023

bedevere-bot mentioned this pull request May 20, 2023

Replace built-in hashlib with verified implementations from HACL* #99108

Open

gpshead changed the title ~~gh-99108: Release the GIL around hashlib built-in hash updates.~~ gh-99108: Release the GIL around hashlib built-in computation May 20, 2023

reword NEWS.

4cf2a19

msprotz approved these changes May 22, 2023

View reviewed changes

Modules/md5module.c Show resolved Hide resolved

Modules/md5module.c Show resolved Hide resolved

Modules/md5module.c Show resolved Hide resolved

bedevere-bot added the awaiting core review label May 22, 2023

Better describe the ENTER/LEAVE macros.

88f190f

Explain the locking strategy in comments.

c8e0b01

msprotz reviewed May 22, 2023

View reviewed changes

Modules/_hashopenssl.c Show resolved Hide resolved

gpshead marked this pull request as ready for review May 22, 2023 23:03

gpshead requested a review from tiran as a code owner May 22, 2023 23:03

gpshead added the needs backport to 3.12 bug and security fixes label May 22, 2023

gpshead enabled auto-merge (squash) May 22, 2023 23:05

gpshead added 2 commits May 22, 2023 16:05

Merge branch 'main' into hashlib_hacl_dropGIL

2830028

Merge branch 'main' into hashlib_hacl_dropGIL

0e60cec

gpshead merged commit 2e5d8a9 into python:main May 23, 2023

bedevere-bot removed the awaiting core review label May 23, 2023

bedevere-bot removed the needs backport to 3.12 bug and security fixes label May 23, 2023

gpshead deleted the hashlib_hacl_dropGIL branch May 23, 2023 00:22

erlend-aasland mentioned this pull request Jun 6, 2023

3.12 backport gh 105236 #105358

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gh-99108: Release the GIL around hashlib built-in computation #104675

gh-99108: Release the GIL around hashlib built-in computation #104675

gpshead commented May 20, 2023 •

edited by bedevere-bot

Loading

gpshead commented May 20, 2023

msprotz commented May 20, 2023

gpshead commented May 20, 2023 •

edited

Loading

msprotz left a comment

gpshead commented May 22, 2023

miss-islington commented May 23, 2023

bedevere-bot commented May 23, 2023

gh-99108: Release the GIL around hashlib built-in computation #104675

gh-99108: Release the GIL around hashlib built-in computation #104675

Conversation

gpshead commented May 20, 2023 • edited by bedevere-bot Loading

gpshead commented May 20, 2023

msprotz commented May 20, 2023

gpshead commented May 20, 2023 • edited Loading

msprotz left a comment

Choose a reason for hiding this comment

gpshead commented May 22, 2023

miss-islington commented May 23, 2023

bedevere-bot commented May 23, 2023

gpshead commented May 20, 2023 •

edited by bedevere-bot

Loading

gpshead commented May 20, 2023 •

edited

Loading