-
Notifications
You must be signed in to change notification settings - Fork 117
Unbounded callback lifetimes cause rlua to be unsound #97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I thought 0f5a9a3 would just barely fix it, but it doesn't :( I forgot about I have tried for a few hours to come up with any other way to express callbacks that allows me to use the correct internal callback type and I can't come up with one. Either I'm defeated by lack of ATCs or not being able to do blanket trait impls on another trait. I'll keep trying, because this is now, afaict, an unfixable soundness issue (without drastic API changes and macros). Edit: even if such a function didn't exist in the stdlib, it would be pretty ridiculous to say that it's unsound to make a |
Okay, I've attacked this problem from a whole bunch of angles, time to brainstorm. The first answer I came up with was to simply store a special I could just simply require additional macro calls to create rust callback functions. This solution really sucks usability wise, but I know it works. I could require that callbacks take I could remove the 'lua lifetime from I could actually find some way of expressing the trait bounds that works without lying, and without the use of macros. I can't come up with any way to do it after a few hours of looking at it (and this isn't even the first time I've looked), but that doesn't mean it doesn't exist. I can't come up with any other solutions, but I'm convinced that I need to solve this pretty soon. There are other difficulties that I haven't mentioned also, like with the I think I'm leaning toward removing the lifetime from |
Arnavion on IRC helped me think through the HRTB problem, and they got really far, but it might just be impossible currently: It looks really promising, but uncomment line 69 if you want to see fireworks. |
Possibly a sucky solution, but one I offer since I've gotten down similar rabbit holes like you have in regards to wlroots-rs (though haven't programmed in Rust in a while, so feel free to disregard) and it can be really draining to struggle to understand just what exactly will solve the problem (if anything): Just make it If you eventually find a solution, great then replace it with that. If you can't then at least it's documented how to use the function and if there's a bug you know you should look for mis-uses of that first and then look for bugs in the rest of your unsafe code. It's super sucky, but after dealing with so many headaches with wlroots-rs I'm feeling much more willing to punt some of the trickier problems on to the user of the library especially pre-1.0. That isn't acceptable for a 1.0 library, but this isn't a 1.0 library. I think this is something Rust libraries need to encourage more, especially in the APIs that sacrifice ergonomics for safety (*cough* wlroots-rs *cough*) having an escape hatch that is clearly marked |
Nah, the situation isn't THAT bad yet, like I said I could offer a completely safe interface by just having functions take MultiValue and do the arg / return conversion manually with Lua is very very unsafe and hard to deal with in C, the whole reason (for me) that this library is useful is because it's safe, I would rather give a slightly unergonomic API than an unsafe one. Unsafety is about as unergonomic as it gets, really. After sleeping on this problem, I don't think it's actually going to be a big deal to solve. Edit: After thinking about it more, I don't necessarily disagree with you in principle, I just think that in this specific case, an API that just used callbacks with MultiValue is not that unergonomic, so reducing a bit of annoying boilerplate in that case is not worth unsafety. Your experiences are probably much different, you probably mean "unergonomic" as in: totally restricts large amounts of what you might want to actually do. In a case like that, I would probably agree with you, and also agree that it's not a great place to be but it would be fine for pre-1.0. Luckily the worst case scenario here is not THAT bad. |
Okay, so removing the lifetime from However, not all the examples compile, and I'm not sure if I can fix it! 🙃 It's kind of terrible in other ways, too. For example, there's no good safe way of converting MultiValue other than "adopting" all of the values inside it, which will be horribly slow (unless some real optimizer magic happens). That's probably fixable, but the lifetimes in callbacks are a still bit of a mess. This is getting annoying! Time to bring out the big guns. I have another solution to this problem that I've had in my back pocket for a while, but I've been avoiding it until I tested it in luster. If it works, it will fix this problem and make a few things which are currently only logical errors into type errors, but it's a big change. The idea is that you have pretty much the API that exists now, but most of the API is wrapped in a It's probably not that useful to look at luster to see this API in action because not enough of the interpreter really exists yet, so here is a rough sketch of what it looks like. I need a better names than |
Right now my current plan is something like this. Basically everywhere there is currently The logic behind splitting things into Context and Scope is 1) that Scope requires multiple lifetimes and it makes a noisy, difficult default, and 2) as soon as you enter a into a static callback you have no sensible scope, so there would have to be some default 'static scope of some kind and thus nothing general could rely on it anyway. It might be sensible to have just scoped methods receive the same scope as their first parameter, rather than a plain context, if that's possible. I know this is a lot to follow and I keep updating it with more craziness. I don't know who all is that interested in all the messy API design here, but talking it through in this issue helps me think about it so I'm going to keep doing it. |
I'm interested, at least so far as it affects Way Cooler 😊. Already I do weird stuff with the Lua scope, so these changes will probably affect me. These changes sound for the better though, and similar to how I dealt with unsafe in wlroots-rs (using handles with scoped callbacks instead of the object directly, forcing the return types to fit in that scope if they are non copy). Got a branch where you're working on this? |
I'm glad you're interested! I'll happily keep you up to date then if only so I have somebody checking my thought process. I'll push what I'm working on to a branch later today, I think it might actually require building the API from scratch in parallel because there are a lot of parts and they all require changing at once, so it's taken me a while to get started. |
Wow. I just read that issue and the letters are dancing all around because of all the technicalities and I somewhat understand what you wanna do. I see that you want to make all Lua registration/execution right in a scope (like if it was the only method), but assuming it gets all removed when the scope ends, would that prevent removed unnecessary code and explanation, please see the edit history If I understood your solution to the callbacks properly, I think this would solve the problem for complex code bases relying on a long living Lua context to be available when needed (including a script for a game object). If I didn't then I'll discover it when a beta solution may be compiled! I'm still impressed by how important these little details matter to you! |
It would prevent a Context being stored in a type that was passed into Lua, or moved outside of the outer call to Maybe it'll be easier to understand when I push to the The whole thing with channels there would not be necessary and wouldn't give you any extra power, because you wouldn't be able to pass 'lua types through the channel anyway.
I mean a soundness hole is a soundness hole, imo, so it's not really little. |
As a side effect, would fix #97, which is the actual driving motivation behind this change. Adds compiletest_rs tests for context lifetime invariance, and all the tests and compiletest_rs tests pass, but the documentation and doctests are *all* screwed up. There are some outstanding API design questions, but the strategy SEEMS to actually work.
So there's now the contextify branch with all of my proposed changes. The changes seem basically to work, there honestly weren't too many surprises. I believe the only things you can't do now that you could do before are:
The first point there honestly should affect 0% of use cases, you should just NOT be doing that at all and instead use The only real thing that I'm even marginally concerned with is the second point about mixing references across a The existing The logic behind that is that In the version of the API currently in This is probably not too restrictive since you can usually move whatever lua stuff you need to into a scope call and then the data that the scoped types reference outside of the scope call. Since scoped types can't reference anything that contains a context lifetime anyway, this should always be possible to do, however it's still slightly annoying. As an alternative to the current API, I could make the following changes:
This would basically end up with something similar to how the existing Right now I'm kind of leaning toward making that change since I also think that the |
Another important point. Since actually writing the change, it became clear to me that this changeset definitely doesn't fix the fact that callback lifetimes are a lie, it just makes it so that it's sound (I hope). The generative invariant 'lua does simplify some of the logic behind why things are safe, but it would still be genuinely better if the inner callback type was: type Callback<'a> = Box<for<'cb> Fn(&'cb Lua, MultiValue<'cb>) -> Result<MultiValue<'cb>> + Send + 'a> Luckily, this is actually a completely independent axis, so we could still make that change no problem, either by using macros or waiting for ATCs. I think this change would have been positive regardless of the soundness hole, it also just so happens to make it so that we don't require macros or ATCs for soundness. Anyway, right now what I'm looking for is feedback about the API, do you think the change I proposed above re: Scope and Context is a good idea? Do you think this branch is workable at all? Do you really hate the context callbacks? |
So I went ahead and made those |
I haven't had a chance to look at the changes in detail yet, but it looks like you need to have a scope to create a Lua function now to get around the safety issue? Currently Way Cooler relies on function creation to build it's object capabilities, would I now need a context to do that? If so I will probably just never drop it since it should always be valid (just like I currently do with Lua in a lazy static). |
So you don’t need to keep the context alive in order to keep values alive, all you need to do is call context to access Lua. At the end of the Lua::context call, all the Lua state is as you left it, Lua::context is just a callback in order to convince rust to create a unique lifetime. You can call Lua::context as much as you want, you just can’t mix reference types across the separate calls, or let reference types escape the calls (but you can do whatever you want with Or it’s possible I’m not following your question? |
Ah ok no I think you answered it. I think I'll know more once I can apply the changes to my code. I think I'm a bit behind a few releases anyways, so I'll probably need to fix some other things. I can probably figure it out on my own but I might ping you if I have more questions. |
I'll also make a PR so we can talk about it outside of this issue. Thank you for trying to apply this branch to way-cooler, having somebody test this on large existing uses of rlua is super helpful. If you have any questions or complaints, even if it's just minor paper cuts, let me know. |
Great, this answers my worries! I thought the functions wouldn't outlast the scope they were made in. Later this weekend, I will make a kinda real-world scenario application with the new branch so I can give you feedback on how it felt using it (from a compiler's appreciation of lifetimes and a convenience perspective, not performance). Source code will be available online if you want to give it a look. By the way, semi-non related, would you put the new |
As a side effect, would fix #97, which is the actual driving motivation behind this change. Adds compiletest_rs tests for context lifetime invariance, and all the tests and compiletest_rs tests pass, but the documentation and doctests are *all* screwed up. There are some outstanding API design questions, but the strategy SEEMS to actually work.
I'm pretty sure it's not the right place to ask (I don't want to make a separate issue) but I am not sure how to deal with lifetimes on functions within a context. I don't think it's related with In this source file, I'm trying to get a reference to a struct from inside a context function created in a method of that same struct. However, the Are you supposed to give Thanks! |
This is not really the right place to discuss this. The You might have better luck with |
Interesting, I'd give it a try later. Thanks for the hint. That being said you're free to remove the comments to clear up the issue. However, I don't see this as a poorly documented PR. It is pretty easy to use and understand how you're supposed to use it without looking at some doc. I'm just not used to the way the crate works, even on the For now everything works as expected, I'll notify you on the PR if I believe something is missing/feels odd. |
Uh oh!
There was an error while loading. Please reload this page.
I'm going to write a very long explanation of this bug, because the root issue here is something that has come up again and again, and I need to document exactly what's going on.
tl;dr I might have an easy fix, but the internal types for callbacks inside rlua are problematic and they probably should change before 1.0. (Edit: I don't have an easy fix D:)
So, the internal type of callbacks in rlua is (basically) this:
There are a few different ways to create rust callbacks in rlua, but all of the APIs are basically variations on
Lua::create_function
:These functions take an F and produce the internal
Callback
type, which is then later used to create a Lua callback withlua_pushcclosure
, but that part isn't terribly important. What is important here is the'cb
lifetime: all of the APIs that look like this have the'cb
lifetime as an unbounded user chosen lifetime, and this is deeply, deeply wrong. This has always been wrong, and this has been wrong since the very first versions of this crate.Here's what the type of the internal callback should be:
which is to say that the internal callback should be a function that given any
'lua
lifetime, it can take aLua
ref and arguments with the same'lua
lifetime and produce results with that same'lua
lifetime. This is the actual, logical requirement that we want to express on the callback, and though we can make this type just fine, problems start when we want to make our callback API. Let's see how we might change ourcreate_function
type to allow for the correct callback type:Except, this will never work, because we have three separate
for<'cb>
HRTBs and there is no way to tell the rust compiler that we need to universally quantify all three trait bounds over a single lifetime.What's especially frustrating is that it's actually totally possible to write the code that produces the correct callback type, but it's not currently possible to put that code into a function and name its type signature; observe.
You can see me struggling with this problem in a very old reddit post here. To proceed with the API and get around this otherwise very tricky or impossible to solve limitation, I picked an API that conveniently "lied" about the callback lifetimes, and I believed this was safe because all callbacks at the time were
'static
anyway. Well, honestly I had a pretty poor understanding of the borrow checker at the time and I was very unsure of everything, but later on I became slightly more confident that my lie based interface was safe due to the'static
callback requirement. I now know this is wrong, and should have realized this sooner.I lied a bit above, the real callback type inside rlua is actually not what I said above, there is actually a second lifetime parameter to make the callback itself non-'static for use in
Lua::scope
, so the real callback type is actually this:This is to facilitate the
Scope
callbacks having a non-'static lifetime, which if you understand how the argument lifetimes are lying there, you know this is obviously a problem. At the time I initially made theScope
interface I didn't really fully understand the implications of the misleading callback types that I was using. I later found out via #82 that the initial interface that I had written was of course unsound, and it was only after fixing that issue that I more fully understood the problem.From a certain perspective, the misleading callback types are the root cause of #82, because the callback type requires this user chosen, unbounded
'cb
lifetime and that makes writing a sound interface extremely difficult and easy to get wrong. I believe the solution for #82 is correct, but it is extremely delicate and intricate and confusing, and this is the LAST property that you want when writing rust code withunsafe
.So the reason that I'm writing about this problem now is not actually due to #82. Instead it is because I had a minor panic attack today over the following thought: "what if you convinced the borrow checker to pick
'static
as the'cb
lifetime in rlua callbacks?".Well, it turns out you can totally do this, and it's obvious that you can do this and I should have realized this sooner. Observe this monstrosity:
So the proper fix is honestly the same as the proper fix for #82, which is to fix the callback types, and thus also fix the callback API. However, without ATCs, it appears that the only way that I can do so is either macros (to write the voldemort function who's type I can't name), or possibly some other trait trick I'm not aware of, or possibly redesign
ToLua
/FromLua
to not require ATCs, I'm not sure.In the meantime, there's a hopefully simple fix for this issue, but I need to test it out first, and I'll be back in a moment.
The text was updated successfully, but these errors were encountered: