Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python::allow_threads is unsound in the presence of scoped-tls. #3640

Open
steffahn opened this issue Dec 11, 2023 · 43 comments · May be fixed by #3646
Open

Python::allow_threads is unsound in the presence of scoped-tls. #3640

steffahn opened this issue Dec 11, 2023 · 43 comments · May be fixed by #3646

Comments

@steffahn
Copy link

In analogy to

we can smuggle Ungil data across Python::allow_threads using scoped-tls, too.

use pyo3::prelude::*;
use pyo3::types::PyString;
use scoped_tls::scoped_thread_local;

fn main() {
    Python::with_gil(|py| {
        let string = PyString::new(py, "foo");

        scoped_thread_local!(static WRAPPED: PyString);

        WRAPPED.set(string, || {
            py.allow_threads(|| {
                WRAPPED.with(|smuggled: &PyString| {
                    println!("{:?}", smuggled);
                });
            });
        });
    });
}

(results in segfault in my test)

Unlike #2141, this issue is virtually unsolvable, i.e. even the auto trait approach with the feature="nightly" enabled cannot catch this at all. There’s no property in the callback to allow_threads that can catch this. Really, the callback doesn’t even capture anything at all:

use pyo3::prelude::*;
use pyo3::types::PyString;
use scoped_tls::scoped_thread_local;

scoped_thread_local!(static WRAPPED: PyString);

fn callback() {
    WRAPPED.with(|smuggled: &PyString| {
        println!("{:?}", smuggled);
    });
}

fn main() {
    Python::with_gil(|py| {
        let string = PyString::new(py, "foo");

        WRAPPED.set(string, || {
            py.allow_threads(callback); // callback is an ordinary `fn() -> ()` item.
        });
    });
}

Again, there’s a “whose fault” question to be asked, whether the fact that the standard library’s thread-locals require T: 'static means there’s any guarantees that non-'static data isn’t allowed to be “smuggled” through thread-local storage. In my view, if you look at what kind of things it offers, scoped-tls seems even less unreasonable to be called “sound”, compared to sync_wrapper. Yet the consequences for Ungil are more detrimental.

@adamreichold
Copy link
Member

From a first glance, I fear that saving the current approach might mean to never ever expose any GIL-bound references no matter how short-lived and have all operations explicitly require a GIL token. This would be highly unergonomic and imply that our upcoming GIL-bound smart pointers Py2<'py, T> would provide no benefit over their existing unbound counterpart Py<T> at all.

If we somehow could avoid at least storing our GIL-bound smart pointers within scoped-tls, but that library has no traits bounds at all. So I suspect this is coming down to either abandoning a safe allow_threads or socially establish that

whether the fact that the standard library’s thread-locals require T: 'static means there’s any guarantees that non-'static data isn’t allowed to be “smuggled” through thread-local storage.

should be upheld. Or at least require an extra unsafe trait bound for usage of types with scoped-tls.

In my view, if you look at what kind of things it offers, scoped-tls seems even less unreasonable to be called “sound”, compared to sync_wrapper. Yet the consequences for Ungil are more detrimental.

Better yet, there would be a case of unsoundness implied by usage of scoped-tls that does not involve the admittedly somewhat baroque architecture we built to accommodate Python in its GIL.

@steffahn Since you are obviously experienced in the finding these holes, did you ever look into scoped-tls from a blank slate perspective?

@adamreichold
Copy link
Member

From a first glance, I fear that saving the current approach might mean to never ever expose any GIL-bound references no matter how short-lived and have all operations explicitly require a GIL token. This would be highly unergonomic and imply that our upcoming GIL-bound smart pointers Py2<'py, T> would provide no benefit over their existing unbound counterpart Py at all.

But this is not true, right? Because I could also use scoped-tls to smuggle the GIL token itself even if it has a generative lifetime attached?

@steffahn
Copy link
Author

But this is not true, right? Because I could also use scoped-tls to smuggle the GIL token itself even if it has a generative lifetime attached?

Yes. At least in the extended version provided in the crate scoped-tls-hkt.

@steffahn
Copy link
Author

use pyo3::prelude::*;
use scoped_tls_hkt::scoped_thread_local;

fn main() {
    Python::with_gil(|py| {
        scoped_thread_local!(static GIL: for<'a> Python<'a>);
        GIL.set(py, || {
            py.allow_threads(|| {
                GIL.with(|smuggled_py: Python<'_>| {
                    // profit
                });
            });
        });
    });
}

@davidhewitt
Copy link
Member

Ouch, this is painful. Thanks @steffahn for finding such an interesting case!

My feeling here is that this is a much more severe mechanism to break allow_threads than the interaction with send_wrapper. At least with send_wrapper any smuggled data is part of the closure capture, so careful inspection around the crash could in theory identify the issue. This scoped-tls approach can enable you to have the smuggling be located in a completely separate part of the codebase to the allow_threads call, I think it would be much harder to resolve this.

It seems like the natural reaction is to declare allow_threads an unsafe function, but I think this invariant "you must be sure that scoped-tls and send-wrapper are not used to smuggle a Python marker into code invoked inside the closure" becomes extremely hard to uphold. Still, at this point it's likely a necessary change unless we can find a way to resolve the API.

What would a "safe" solution look like? Can we somehow express Python as a borrow and py.allow_threads takes an exclusive borrow, thus preventing all gil-attached data and smuggling? Maybe, but I suspect that at least some of PyO3's existing APIs need changing to achieve this.

Alternatively, a future approach might involve contexts, where allow_threads removes the context, but this is obviously not available to us at present.

@adamreichold
Copy link
Member

What would a "safe" solution look like? Can we somehow express Python as a borrow and py.allow_threads takes an exclusive borrow, thus preventing all gil-attached data and smuggling? Maybe, but I suspect that at least some of PyO3's existing APIs need changing to achieve this.

I think this would basically be equivalent to removing the Clone and Copy impls for Python and making it invariant over the lifetime? But I suspect especially the invariance part (requiring a "reborrowing" operation) would be quite painful from an ergonomics point of view?

@davidhewitt
Copy link
Member

davidhewitt commented Dec 12, 2023

At least that I think, yes. I think it might also have implications for .py() methods to get a copy of Python, unless they are also expressed as borrows on &Python (or similar), but I haven't thought this through.

My feeling is that coming up with a workable proposal for that is going to need someone to dive deep into this for at least a few days.

How do you feel about accepting Python::allow_threads as unsafe in the short-mid term? Hopefully we can come up with a long-term solution to make it safe again.

Do we think this has implications for the proposed Py2<'_, PyAny> smart pointer? I think if we make allow_threads unsafe then there's no safety concerns about that API, and we are already aware that being able to assume the GIL is held is a significant ergonomics win, especially in trait implementations, so I believe that API still makes sense.

@davidhewitt
Copy link
Member

(Given the Py2<'_, PyAny> change is mostly motivated by getting rid of the pool, and thus leading users to have proper treatment of ownership in their code, I think that it's a correct evolution of the API even if this issue might imply that it will eventually need to change again.)

@adamreichold
Copy link
Member

How do you feel about accepting Python::allow_threads as unsafe in the short-mid term? Hopefully we can come up with a long-term solution to make it safe again.

My gut feeling is to not rush this (after all, we having been living with the send_wrapper soundness hole for quite some time). The reasons I see for keeping the status quo for now are:

  • I am not sure whether scoped-tls poses a problem in itself, not just for PyO3. But I have not yet had time to look into constructing an issue without involving PyO3.
  • As you wrote, it is basically impossible to check the required invariants if scoped-tls is involved due to the non-local reasoning required. (This non-locality is basically where my gut feeling that scoped-tls is generally a bad idea comes from.)
  • I fear that marking allow_threads unsafe will "infect" a lot of code due to the closure-based API by effectively making the closure body an unsafe block as well, c.f. this playground. Hence, I think the resulting API will lead to more actual errors than the existing soundness issues.

@davidhewitt
Copy link
Member

Sure thing. The last point in particular that unsafe { } will infect the closure is a very good point which I had not considered, agreed that can also create problems.

@davidhewitt
Copy link
Member

Just for the sake of throwing out another middle-of-the-road solution, what if the method was renamed to unsafe_allow_threads for now? That would avoid the infective problem while also establishing some sort of social contract via the use of the term "unsafe".

I can see a very mixed reaction from the community to doing that (e.g. we might be accused of shipping code which doesn't trip #![forbid(unsafe_code)]), but it might be a practical choice.

@steffahn
Copy link
Author

FYI, I came across this issue from a context of discussing the role of Send/Sync on IRLO. Asynchronous tasks have a similar problem as Ungil, as they would love to behave more like threads in putting up Send/Sync-like barriers, in order to allow non-thread-safe API (like Cell or Rc) to be used task-internally. But thread-locals (in that case thread-locals in general, not just scoped ones) are a major problem for that.

In my reply in that thread I argued that a Send-like trait for non-physical-thread-based memory boundaries, i.e. one that cannot be circumvented thread-internally with any form of thread locals, would be sufficient to offer a sound allow_threads, too. Arguably, one that has the same issue of overly cautiously requiring non-thread-safe data not to pass into the closure at all, not just non-python-memory-involving data, but at least it could be sound.

Of course that’s not any form of short-term solution; landing such a language feature would be a huge undertaking, unless we find a simpler way of keeping it backwards compatible.


Regarding invariance as a solution, look at this comment and my answer, which is an essentially-invariant situation. Invariant lifetimes can always be hidden behind a shorter covariant one using trait objects.


Regarding the unsafe scoping, I think that’s an unfortunate effect of current unsafe block syntax. One can employ workarounds such as requiring an unsafe-to-construct token to be passed to allow_threads like

py.allow_threads(unsafe { marker_name_tbd() }, || {})

@davidhewitt
Copy link
Member

davidhewitt commented Dec 12, 2023

Thank you, I'll try to digest that thread in the near future!

The unsafe-to-construct token is probably the better workaround than my unsafe_allow_threads suggestion 😂

@adamreichold
Copy link
Member

adamreichold commented Dec 12, 2023

Ok, one most likely idiotic idea. Thread-locals in whatever form currently break the isolation we require by allow_threads. Could we try to side step this by always executing the closure passed to allow_threads on a separate runtime thread (sufficiently long-lived to amortize the startup cost)?

Since we (at least on stable) already require the closure to be Send, shouldn't this plug both soundness holes? Of course, most likely at prohibitive performance cost...

EDIT: Maybe we would need a completely need thread for soundness? Or we would need to track whether the GIL was ever acquired on these runtime threads?

@steffahn
Copy link
Author

steffahn commented Dec 12, 2023

EDIT: Maybe we would need a completely need thread for soundness? Or we would need to track whether the GIL was ever acquired on these runtime threads?

Probably enought to track whether GIL is currently acquired on such a runtime thread, not ever acquired. However, for this notion of “currently required” we would need to avoid unlocking via allow_threads. One could keep the required thread-pool for this fairly small, assuming a limited number of nestings of with_gil and allow_threads calls. Hopefully with a small constant, in most practical code? – well, or is this an entirely unreasonable assumptions and there are good use-cases that make use of allow_threads and with_gil nested in each other in deep recursions?

If with_gil required Send, too, one could just make two threads play ping-pong with callbacks between with_gil and allow_threads, so that we’d just need at most a single helper thread per user-created thread. Of course, making with_gil more expensive might be way worse than making allow_threads more expensive, as for the latter, at least, one could argue that it only ever is called when there’s a benefit to be expected from context switching, i.e. it shouldn’t happen for trivially short interactions, anyways, limiting the impact of further overhead.

@davidhewitt
Copy link
Member

I think it's actually quite a nice idea as a way to add a safe allow_threads mechanism which really does rely on Send. The performance cost I think we could measure and decide how much it is.

As moving work to a different thread seems like a semantic change, one option is to introduce this as a new safe API e.g. py.without_gil (better name welcome) and keep the old allow_threads around with some sort of unsafe hint to let users decide what's appropriate for their own code.

@adamreichold
Copy link
Member

As moving work to a different thread seems like a semantic change, one option is to introduce this as a new safe API e.g. py.without_gil (better name welcome) and keep the old allow_threads around with some sort of unsafe hint to let users decide what's appropriate for their own code.

I think we should turn this around, make Python::allow_threads unconditionally safe by using another thread (as we already require Send, this must be sound even it is a behavioural change) and provide a new unsafe API as an escape hatch where the overhead of the thread change is too high.

@adamreichold
Copy link
Member

well, or is this an entirely unreasonable assumptions and there are good use-cases that make use of allow_threads and with_gil nested in each other in deep recursions?

I think deep nesting is pathological but a certain degree of nesting can be expected due to callback-style API and things like that.

If with_gil required Send, too, one could just make two threads play ping-pong with callbacks between with_gil and allow_threads, so that we’d just need at most a single helper thread per user-created thread. Of course, making with_gil more expensive might be way worse than making allow_threads more expensive, as for the latter, at least, one could argue that it only ever is called when there’s a benefit to be expected from context switching, i.e. it shouldn’t happen for trivially short interactions, anyways, limiting the impact of further overhead.

I think I would want to avoid changing with_gil at all especially due to the last point, i.e. when allow_threads is called, some context switching appears to be expected in any case and making this slightly slower seems reasonable.

@adamreichold
Copy link
Member

One thing I just noticed is that Python::with_pool is not amenable to this solution because there we actually do want to stay on the thread. But I guess the fix is much more palatable to us: Get rid of the GIL pool and hence with_pool altogether.

@Diggsey
Copy link

Diggsey commented Dec 14, 2023

I'm the author of scoped-tls-hkt (but not scoped-tls or the other mentioned crates).

I'm fairly familiar with the Python GIL, but not that familiar with how PyO3 currently solves this problem.

If I were implementing from scratch, I would model it something like this:

  • When you lock the GIL you get a mutable reference whose lifetime only extends as long as the GIL is locked.
  • When you temporarily release the GIL you have to return that mutable reference for the time it is released via reborrowing.
  • Types which require the GIL to be held logically contain an immutable borrow of the GIL.
  • Types can be "downgraded" to non-GIL versions, and then later upgraded with a new GIL reference in order to retain references across GIL release points.

Comparing this to what I see PyO3 currently implements, it looks like the main difference is that the lifetime on Python<'_> is covariant, making the type itself Copy, and this is what leads to the unsoundness. Presumably this is to make it easier to use, since an invariant lifetime would mean that you could only use the Python<'_> object once without explicit reborrow methods.

However, if you change Python<'_> to be &mut Python then you get to take advantage of Rust's implicit reborrowing, and the object can be used multiple times.

@adamreichold
Copy link
Member

Comparing this to what I see PyO3 currently implements, it looks like the main difference is that the lifetime on Python<'_> is covariant, making the type itself Copy, and this is what leads to the unsoundness.

If I understood things correctly, making the lifetime invariant was already ruled out as a solution in the middle section of #3640 (comment), i.e. via the linked technique used to create an exploit for objc2 via scoped-tls-hkt.

@steffahn
Copy link
Author

  • When you temporarily release the GIL you have to return that mutable reference for the time it is released via reborrowing.

That sounds, I think, like it would force users to get rid of all GIL-dependent python object references created with that GIL reference before the temporary release.

I’m not actually a user of pyo3, so I can’t determine how much that would or wouldn’t go against common use cases.

@steffahn
Copy link
Author

Comparing this to what I see PyO3 currently implements, it looks like the main difference is that the lifetime on Python<'_> is covariant, making the type itself Copy, and this is what leads to the unsoundness.

This should be independent of variance concerns, and all about Copy / the ability to duplicate the Python<'_> token. After all, &'a mut T is covariant in the lifetime 'a as well.

@alex
Copy link
Contributor

alex commented Dec 14, 2023

A pretty typical use case for releasing the GIL is to process a &[u8] that was extracted from a &PyBytes. &PyBytes are immutable so it's sound, as long as someone has a reference. Not being able to operate on these types of objects without the GIL would be disruptive I think.

That said, you could work around this via https://docs.rs/pyo3/latest/pyo3/prelude/struct.Py.html#method.as_bytes for this specific case.

@Diggsey
Copy link

Diggsey commented Dec 14, 2023

If I understood things correctly, making the lifetime invariant was already ruled out as a solution in the middle section of #3640 (comment), i.e. via the linked technique used to create an exploit for objc2 via scoped-tls-hkt.

I don't agree - releasing the GIL would require a mutable borrow, so it's impossible to "store" the GIL somewhere for later and also call the release function.

That sounds, I think, like it would force users to get rid of all GIL-dependent python object references created with that GIL reference before the temporary release.

It would force them to be downgraded (to non-GIL objects) or no longer used, yes.

This should be independent of variance concerns, and all about Copy / the ability to duplicate the Python<'_> token. After all, &'a mut T is covariant in the lifetime 'a as well.

Yeah sorry you're correct - it's the T that's invariant in &'a mut T, not the 'a. The important thing is that &mut guarantees uniqueness regardless of anything else, which is what is wanted with the GIL.

A pretty typical use case for releasing the GIL is to process a &[u8] that was extracted from a &PyBytes. &PyBytes are immutable so it's sound, as long as someone has a reference. Not being able to operate on these types of objects without the GIL would be disruptive I think.

I mentioned "downgraded" types - I would expect that safe functionality (such as accessing immutable objects) would be available directly on the downgraded versions of types, so you wouldn't need the GIL in order to access them.

@adamreichold
Copy link
Member

adamreichold commented Dec 14, 2023

While I agree that letting references live throughout the scope of the closure passed to allow_threads during which the GIL is actually not held by the current thread is likely close to the root of this, I also think that forcing borrows into GIL-bound data to be released for these sections is too big an ergonomics hit as one usually releases the GIL to continue working with plain Rust data based on those GIL-bound references, e.g. processing a stream of bytes as mentioned by @alex or working on a NumPy array via an ndarray::ArrayView using rust-numpy which is what I am most familiar with.

In my opinion, forcing the whole ecosystem to implement downgraded wrapper types which continue to provide access while the GIL is temporarily released seems too big an ergonomics hit compared to having the type system work this out as it is now.

EDIT: Also c.f. #3610 (comment) for another situation where the existing design seems to work out well automatically. Of course, the solution by starting threads might still be prohibitively expensive over there...

@Diggsey
Copy link

Diggsey commented Dec 14, 2023

In my opinion, forcing the whole ecosystem to implement downgraded wrapper types which continue to provide access while the GIL is temporarily released seems too big an ergonomics hit compared to having the type system work this out as it is now.

Fair enough. Another option would be to always use the "downgraded" versions of the types everywhere, and require a GIL reference to be passed in to all methods that require it (including getters that access mutable data).

@RalfJung
Copy link

RalfJung commented Dec 14, 2023

the fact that the standard library’s thread-locals require T: 'static means there’s any guarantees that non-'static data isn’t allowed to be “smuggled” through thread-local storage.

Just chiming in here (without having read the entire thread, sorry) to note that as far as I am concerned, the fact that static are restricted to Sync + 'static does not mean that only Sync + 'static data can be smuggled from a caller to its callee via external means. Any way of dynamically checking that the data is only used in the same thread and in a suitable scope is equally appropriate. There's no fundamental relationship between "contains a non-'static lifetime" and "cannot be smuggled around"; that relationship is entirely coincidental. Non-'static types are harder to smuggle around, but (unless and until the lang team decides otherwise) you can't assume that they are impossible to smuggle around.

Looking at some of the prior discussion here around an Ungil auto trait, I was wondering what the soundness contract of Ungil is that would make this work. There's no Rust type that represents "absence of smuggling". The fact that the type name is negated already is a big red flag indicating it probably can't work; the counterexample in this issue is the nail in the coffin IMO. In general it's hard, if not impossible, to express negative information in type systems. If you want the user to prove something, you need to ask them to produce a value of a certain type that serves as a witness to show that this is indeed proven. "Data you can't smuggle" is a concept that I have no idea how to possibly formalize. It might have to be some notion of "this is a pure function"? Neither 'static nor Send achieve that, that is not their intention.

@davidhewitt
Copy link
Member

Fair enough. Another option would be to always use the "downgraded" versions of the types everywhere, and require a GIL reference to be passed in to all methods that require it (including getters that access mutable data).

My feeling is that this might be a viable long-term solution, but there are very compelling performance and ergonomics reasons to not require the "GIL reference to passed in to all methods that require it". The main reason is for interactions with traits: for example, what about Clone, Drop, Iterator, or even Display, for the downgraded references? At the moment they would incur at minimum a TLS lookup per trait method call to acquire the GIL reference. By having the "attached" types we avoid these overheads. Contexts are a long way off and the design for them is not yet clear, but if there's a possibility to conditionally implement traits based on contexts or even specialize based on context, that might be the way to remove attached types but keep the same ergonomics and performance.

[@Diggsey] If I were implementing from scratch, I would model it something like this:

This is a much better expressed form of what I originally wrote at #3640 (comment)

In my opinion, forcing the whole ecosystem to implement downgraded wrapper types which continue to provide access while the GIL is temporarily released seems too big an ergonomics hit compared to having the type system work this out as it is now.

I definitely agree that it's an ergonomics hit, but maybe it's possible to get there with a macro. Maybe for example we can one day have a macro that takes a list of idents to temporarily downgrade, and reupgrade once the allow_threads call is over.

    let list = PyList::new(...);
    let data = "1, 2, 3";
    allow_threads!(py, [list], { /* do something with `data`, `list` is only accessible as a downgraded reference */ });

I think this line of exploration of is likely impossible until we've moved &'py PyAny to Py2<'py, PyAny> because we first desperately need to decouple reference lifetimes from the GIL lifetime.

@adamreichold
Copy link
Member

I definitely agree that it's an ergonomics hit, but maybe it's possible to get there with a macro. Maybe for example we can one day have a macro that takes a list of idents to temporarily downgrade, and reupgrade once the allow_threads call is over.

I am less concerned with the syntactical side of things, but rather that every creator of such a downgraded type would have to worry what is still allowed and what is not, i.e. figuring out a sound API for these types all over the ecosystem.

I think being able to rely on the existing Send trait is a bit of a strength in disguise here, because it allows us to re-use the isolation properties (or lack of) that existing types already have without doing anything PyO3-specific (ndarray::ArrayView working out of the box being an example of that from rust-numpy). The "only problem" we have is that while we try to establish additional boundaries, the lack of actual thread boundaries bites us via these soundness loopholes.

@mejrs
Copy link
Member

mejrs commented Dec 16, 2023

I think we shouldn't do anything:

  • both the send_wrapper and scoped_tls crates are fairly obscure and not commonly used
  • you have to be trying pretty hard to misuse them and pyo3 in this way
  • it seems to reliably segfault (as far as I've seen), so there is little risk of someone doing this and having it appear to work.

A solution like running it on a task pool has a significant runtime and code complexity cost. Making allow_threads unsafe will just result in people mindlessly slapping an unsafe block around it.

My personal preference would be to, in debug mode only, insert checks that assert that the gil really is held.

@adamreichold
Copy link
Member

I think we shouldn't do anything:

I agree that this could be a reasonable approach. But I think that we should discuss such a choice with the community at large as soundness is quite important to many and I would rather be proactive than reactive in handling possible conflicts.

I also think that if we do this, we should also update our documentation to fully explain these loopholes and why we do not try to close them. We should also drop the Ungil trait because it has no chance of closing them in any case.

My personal preference would be to, in debug mode only, insert checks that assert that the gil really is held.

Could you expand on where you would add these checks? If I understand things correctly, we would need to add them to all functions which require the GIL to be held, e.g. all methods on &'py T and Py2<'py, T>?

@steffahn
Copy link
Author

It's not all that obscure in my opinion: scoped-tls is even a dependency of rustc.

@GoldsteinE
Copy link
Contributor

We should also drop the Ungil trait because it has no chance of closing them in any case.

Doesn’t Ungil trait at least remove false positives (i.e. failing to compile Rc<_> in allow_threads()) and also solve the send_wrapper issue? Sure, it doesn’t solve all the problems, but it’s still has less problems than Send.

@adamreichold
Copy link
Member

Doesn’t Ungil trait at least remove false positives (i.e. failing to compile Rc<_> in allow_threads()) and also solve the send_wrapper issue? Sure, it doesn’t solve all the problems, but it’s still has less problems than Send.

Yes and no. Ungil on nightly has fewer problems than Send. But even if we decided to leave the loopholes open, I would prefer to fall back to Send to hold the door open for a full resolution as I think the underlying issue of controlling thread-local storage does not affect only PyO3 and maybe there will be an ecosystem-wide solution that we could also apply then.

@mejrs
Copy link
Member

mejrs commented Dec 16, 2023

It's not all that obscure in my opinion: scoped-tls is even a dependency of rustc.

But it's fairly obscure at the user-level, right? Most of its dependents seem to be some key ecosystem crates (like tokio for example) and things like testing frameworks.

@mejrs
Copy link
Member

mejrs commented Dec 16, 2023

Could you expand on where you would add these checks? If I understand things correctly, we would need to add them to all functions which require the GIL to be held, e.g. all methods on &'py T and Py2<'py, T>?

Something like that, yes.

@davidhewitt
Copy link
Member

But I think that we should discuss such a choice with the community at large as soundness is quite important to many and I would rather be proactive than reactive in handling possible conflicts.

I think that if we choose to do nothing we definitely need a big community consultation. I also think that the community (rightly) expects that safe == sound, and to ask for forgiveness on an exception will require lots of community goodwill. I also think that as we're now aware of three ways that allow_threads API is problematic, and can produce both segfaults and data races from it, the situation is quite different to when we had a small edge case with send_wrapper which we believed Ungil solved cleanly.

If the community perception is that PyO3 has soundness gaps, my fear is that we become an obstacle rather than a tool for adoption of Rust as a memory safe companion to Python. We may weaken the argument for many as to why they should consider moving their Python extension from C/C++ to Rust, thereby failing to prevent the creation of other vulnerabilities.

Personally, I would much prefer we act to keep PyO3 sound and find a way to modify these APIs, even if the end result is clunky. I'd hope that long term research would find new APIs or language features to improve ergonomics.

My personal preference would be to, in debug mode only, insert checks that assert that the gil really is held.

If these checks don't make debug mode unusably slow then I am always in favour of having extra checks irrespective of the rest of the discussion.

@Diggsey
Copy link

Diggsey commented Dec 17, 2023

If ergonomics are the most important, then perhaps the best option is to get rid of the GIL reference entirely from user code and have it live in a scoped-tls thread local. Have all uses of python types access the GIL in this way. Accessing a TLS slot is not nearly as expensive as has been implied by some of the comments, and it should be possible for the compiler to optimize it away in many cases, since there should be good locality between the locking code and the code accessing those python objects.

(And if the compiler's not smart enough yet, maybe that's something that could be worked on - eg. a MIR level optimization that converts scoped TLS accesses into extra function arguments)

@davidhewitt
Copy link
Member

That's an interesting idea. I see two snags of an "auto gil" API which we'd have to explore:

  • Overheads of the TLS access. As you say, these might be possible to optimise away in many cases. Nevertheless, if it means that PyO3 is fundamentally much slower than C code I think it again hinders transition of the Python ecosystem to Rust. It really depends on how much this overhead actually is.
  • Error cases when the GIL is not locked. Could potentially automatically lock to avoid needing to return an error or panic. I think we'd have to learn what felt best.

Nevertheless I'm certainly open to understanding how it works out in practice. It would probably be worth at least understanding how it compares to our proposed API in #3382 before we push that migration on our users.

@adamreichold
Copy link
Member

Could potentially automatically lock to avoid needing to return an error or panic. I think we'd have to learn what felt best.

I see a danger of deadlocks here which the GIL is already prone to too even without hidden attempts to acquire it.

@dgrunwald
Copy link
Contributor

For historical reference, here are my original thoughts on this issue (from 2015!): dgrunwald/rust-cpython#15 (comment)
Back then I knowingly accepted this unsoundness because it was merely theoretical (scoped-tls didn't exist yet) and I didn't see any other options with acceptable ergonomics. A lot has happened since 2015 (PyO3 made some major design changes after forking from rust-cpython), but I think some points in there (e.g. why non-copyable Python<'_> is problematic and could still be unsound wrt. scoped-tls) might still be applicable.

@davidhewitt
Copy link
Member

Thanks @dgrunwald - it's really interesting reading through that old discussion. It's interesting to see you made some very astute observations there such as the HKT TLS, and even your analysis on API choices feel very familiar to me!

Interestingly latest PyO3 now implements the design which you named "PyPtr", and just like you I am not super pleased about the lifetime ergonomics but it was also a necessary step to overcome peformance and safety costs of the "nicer" abstraction.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants