-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API pattern for “setdefault”-like operations #40
Comments
This does not make sense to me. This case is actually like The use case of |
Yes, |
But that’s already not thread-safe in most cases (the GIL is likely to be releasable during the further customization, if it is at all significant. And in general import is not atomic. |
Indeed. |
This is not a C API question, as it also affects the Python dict API. |
The import API here isn't a My immediate proposal for designing these APIs (I'll expand below):
Key design points I considered:
One thing I've liked from other languages is where the "get or create" API takes a callable to do the creation, so you can lazily create the value to be inserted, but still under whatever locks are required by the collection. I don't think that's something we would eagerly design for though - double-checked locking ought to be enough for our API. |
The difference is that Python will give you TypeError if you get the wrong type. Or SystemError if it's really bad. C will segfault.
A I would get rid of the
IMO, it's easy enough to use a default value if you get the 0.
Yeah. Also, I'd rather avoid calling user code while a lock is held. But to enable double-checked locking, we need to provide:
|
I'd imply
Fair, it was the last parameter I added, and only really because it felt inconsistent compared to
That's going to be impossible. We already have borderline security issues due to this (which nogil ought to fix, by holding the lock while calling user code 😉 ). But I am okay with not calling back into user code. It's just going to be very tough due to how the interpreter has been designed (e.g. overridable |
To the extent that this is related to nogil, I'm fairly confused about what the concrete use cases are. There a few comments here that suggest that the original two examples are not good example use cases. Is it possible to provide one or two concrete examples for each of the proposed patterns? @zooba wrote:
I have an open PR for an internal-only one-time initialization API that follows this pattern (but it's for "call_once", not "get_or_create"). python/cpython#111956 @encukou wrote:
In cases where we are using the nogil per-object locks, which are acquired/released through the critical section API, this is safe and does not risk deadlock. If the user code is simple, plain C code then the everything is atomic. If the user code makes arbitrary calls to the Python API then the callback may not be atomic (we might temporarily release the lock, just like we might temporarily release the GIL), but it wont' deadlock. I can't think of public C-APIs that follow this pattern, but from an implementation perspective we do this for things like It's not really appropriate for |
I think this discussion is mostly about general "setdefault"-like operations, but for the |
That sounds like a good approach -- hammer out the primitives in internal API, and as the next step, decide what to expose and how :)
So the “double-checked locking” still sounds like a good default pattern. If the operation ends up not being atomic, possibly due to something the user doesn't control, then losing thread might do extra work (and it should cleanly discard the result). |
IMO, we need an API pattern for “setdefault”-like operations, which would be safer under nogil.
I was thinking about these proposed new APIs:
PyDict_Pop
PyImport_ImportOrAddModule
The first case is like
dict.get
. IMO, it's good to return 0 and set*result
toNULL
if the item is not found. If the caller needs a default value, they can create it when needed. However, if they then want to insert the default back into the dict, in nogil they run into a race condition: the item might have already been inserted.The second case is like
dict.setdefault
, doing some non-trivial work if the item is not found (in this case, creating a new module and putting it insys.modules
). That should be atomic, so it makes sense to me to use a default -- either take it as an argument, or us an implied empty value. Taking a default argument is not free, though: the caller needs to create it even if it's unused.The trouble with
PyImport_ImportOrAddModule
(and other “setdefault”-style API with implied defaults) is that the newly created default sometimes needs further customization, and that will not be atomic under nogil: other threads have a chance to see a uninitialized module.IMO, we need an API pattern for “setdefault”-like operations that allows creating a custom new value only if needed, and either
Note that this
get
+setdefault
is different fromget
+set
(which we use in a bunch of places, for example in https://github.com/python/cpython/blob/d2f305dfd183025a95592319b280fcf4b20c8694/Python/pylifecycle.c#L2276-L2285 -- didn't check if that instance is safe).With
get
+set
, if there's a race, the first thread's value is discarded -- but that value might already have been retrieved by other code.cc @colesbury
The text was updated successfully, but these errors were encountered: