-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sqlite bind review #7
Sqlite bind review #7
Conversation
Ten0
commented
Nov 22, 2021
- Query is boxed before collecting binds, ensuring any reference built in binds collection stays valid
- We always rely on the destructor of BoundStatement to re-bind Null, which guarantees it also gets re-bound in case of panic
- Remove ManuallyDrop as the contained values actually always need to be dropped when going out of scope.
This commit removes a old hack from our bind serialization layer that requires multiple value copies for the sqlite backend going first to a binary format and later on deserialize the data again to call the corresponding bind function. After this commit it is possible to just store the corresponding values and call the corresponding bind functions directly onto those values. This removes a therotical overhead and should improve the performance on bind heavy workloads (like inserts). To do this we basically need to teach the compiler about a invariant that is already true: Bind values need to live at least as long as the corresponding query. To describe that in the type system we need to introduce a few lifetime bounds for `QueryFragment::walk_ast` and `ToSql::to_sql`. In exchange we can remove the write bound from `ToSql::to_sql`. In my opion that does not make the it harder to implement the corresponding traits as users don't need to interact with those lifetimes at all and they describe something that is already true for almost all cases. Additionally now that rust-analyzer can generate function bodies for trait implementations on the fly users don't need to type those bounds, as they are generated with the body stub. Most of the changes are releated to just introduce the lifetime bounds in all nessesary impls and fix minor incompatibilities there.
This improves the sqlite bind collector to skip even more data copies, by setting the sqlite bind mode to `STATIC`. This prevents that sqlite creates a internal copy of the bind data, which means we must ensure that the data live at least as long as the buffer is bound to the statement. This requires some unsafe hackery to ensure that all data remain avaible. We ensure this by binding all information to `StatementUse` itself, which lives as long as the user can access the executed query. If `StatementUse` is dropped we cleanup the corresponding buffers.
This commit improves the unsafe code usage in the implementation of StatementUse by introducing the correct lifetimes to bind the StatementUse instance to the query itself. This prevents dropping the query (and any bound value) before dropping the StatementUse containing the corresponding binds.
This commit renames some lifetimes related to the iterator setup to use more concrete names than `'a` or `'b`. Additionally the `elided_lifetimes_in_paths` lint is enabled to prevent hidden lifetimes in types. This makes it much easier to reason about unsafe code
This applies the suggested made by Yandros in the rust users forum here: https://users.rust-lang.org/t/code-review-for-unsafe-code-in-diesel/66798/7 Especially: * Sperates the bind wrapper from the name wrapper * Adds some more comments about what safety invariants some functions have * Use `ManuallyDrop` for shared buffers * Fix unbinding bind values if we return early from the `BoundStatement::bind` function, as it would occour if the `Statment::bind` function fails * Remove the `OnlyCell` usage for column names, as it is not required * Add an additional safeguard that clears cached column names after the first call to `.step()`
…iesel::debug_query`
… build with the same dependencies
- Query is boxed before collecting binds, ensuring any reference built in binds collection stays valid - We always rely on the destructor of BoundStatement to re-bind Null, which guarantees it also gets re-bound in case of panic
| SqliteBindValue::BigInt(_) | ||
| SqliteBindValue::Float(_) | ||
| SqliteBindValue::Double(_) | ||
| SqliteBindValue::Null => {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I favor exhaustive writing in this scenario as this guarantees we haven't forgotten anything.
60cde71
to
3e9e97d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like most of the ideas implemented in this PR, unfortunately the semantics about unique ptrs (String/Vec/Box), shared buffers and ManuallyDrop
.
Additionally I'm a bit skeptical about the lifetime transmute there. I've opted to implement most of the other changes as part of diesel-rs@d29eac1
diesel/src/sqlite/connection/stmt.rs
Outdated
// we need to store any owned bind values speratly, as they are not | ||
// contained in the query itself. We use ManuallyDrop to | ||
// communicate that this is a shared buffer | ||
binds_to_free: ManuallyDrop<Vec<(i32, Option<SqliteBindValue<'static>>)>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I understand the current state of rusts unsafe code guide lines removing the ManuallyDrop
here may cause be undefined behaviour. The problem here is that internally this vector stores either a String
or a Vec<u8>
which both internally use a Unique
pointer. As the name suggests this pointer type assumes unique access to the allocated memory. As soon as we bind the buffer to the sqlite statement we violate this contract. According to this is one possibility to communicate to the compiler that it cannot assume this property. (The same argument applies to the query
field above)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fine as long as the corresponding pointer is never used by anything while the other references exist : the unique property is used to enable noalias llvm optimizations, which won't enter into play as long as the other reference is never deferenced.
There are similar issues describing this and what would be actually valid within Kimundi/owning-ref-rs#49.
It can be seen that the issue arises from the code accessing the same memory location through pointers that have noalias.
Note that in this PR, the box is never dereferenced after binds are collected and until they are dropped - which is the same that would happen when writing normal code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had this in mind when writing it but you're right, it's not so simple as my complete lack of commenting on this topic led to believe 😁
It's definitely something I should have commented on as it's definitely super not obvious 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I understand the owning-ref issue and the linked unsafe code guideline issue the topic is:
a) Not decided yet
b) Box<T>
contains a Unique<T>
pointer which gets translated as noalias
so yes technically as long as we don't access the pointer till we resolved the aliasing we should be fine. But that seems a property that is really hard to verify. Especially around drop. For example the compiler already notices that this field is "unused" so it may want to decide to just not store the corresponding struct field till BoundStatement
is used. ManuallyDrop
resolves the Drop
issue, as we then can exactly tell the compiler to perform the drop at a specific location. It does not really resolve any other potential use through the Unique
ptr there. It seems like the only way that is currently accepted as definitively not causing UB is to use Box::into_raw
here to move out the pointer from the inner Unique<T>
type into a normal pointer. This could be stored till the any aliasing is over (so till the Drop
impl) and then reconstruct the Box via Box::from_raw
. (Same reasoning for String
/Vec
as they are internally similar, but there we need to store the length as well).
Especially see that linked playground (notice the difference between release and debug build 😱 )
Note: This discussion likely applies to the mysql bind implementation as well. 😞
(I should note that the more I read about aliasing, the more I come horrified how to ever get that right in the context of non-trivial unsafe code like that one 😱 )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(playground is link broken)
the more I read about aliasing, the more I come horrified how to ever get that right in the context of non-trivial unsafe code like that one 😱
Yeah I didn't realize how hard unsafe
actually was until I had a look at the open issues in owning-ref either 😅
that seems a property that is really hard to verify. Especially around drop. For example the compiler already notices that this field is "unused" so it may want to decide to just not store the corresponding struct field till
BoundStatement
is used.
AFAIK Drop
's behavior is always specified - rust wouldn't call drop
on the BoundStatement
without having properly set all its fields (nor would it ever do that with any other function, regardless of how it optimizes its stack and registries before it has to make that call).
I'm not sure I understand what scenario you have in mind here.
Considering we don't need to use the query and only keep it here to prevent it from being destructed, I believe it's easier to guarantee that a field is never used than it is to guarantee that we don't forget to drop stuff on error/ok/panic. (e.g. in the previous implementation, if anything happens to panic within for (bind_idx, (bind, tpe)) in (1..).zip(binds.into_iter().zip(metadata))
, query
and binds_to_free
are memory-leaked)
(Then again I should definitely have added comments specifying that this field should never be used because of noalias requirements.)
To the extreme, if we want to compile-enforce that the field is never used, we can
mod never_use {
pub struct NeverUse<T> {
_private: T,
}
impl<T> NeverUse<T> {
fn new(inner: T) -> Self {
Self { _private: inner }
}
}
}
pub use never_use::NeverUse;
I'd feel much less confident guaranteeing correctness of rust code that manages its allocations itself with raw pointers manipulations than guaranteeing that the Box is never dereferenced until dropped, and that it is never dropped before the relevant binds are re-bound, using RAII.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a) Not decided yet
Not sure I understand what you mean. It looks like regardless of whether they decide that this code should be UB or not (seems reasonable to me that it is btw), as long as we don't use the "unique" at the same time as a pointer derived from it, we're fine?
Especially considering we're not mutating these values, either side: https://doc.rust-lang.org/nomicon/aliasing.html:
we don't actually care if aliasing occurs if there aren't any actual writes to memory happening
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd feel much less confident guaranteeing correctness of rust code that manages its allocations itself with raw pointers manipulations than guaranteeing that the Box is never dereferenced until dropped, and that it is never dropped before the relevant binds are re-bound, using RAII.
I feel exactly the other way around. Handling memory allocations is something I'm quite confident about, while I fear that the compiler may decide that this are not used and therefore removes something. Maybe I'm just trying to be too perfect here...
Otherwise I've played a bit with an modified implementation that stores raw pointers. I've got a working version that:
- Uses drop as single "cleanup" point
- Only contains a minimal trivial amount of unsafe ( no transmute, no pointer case, just
Box::from_raw
andNonNull::new_unchecked
)
I personally find that version quite readable and surprisingly simple compared to any other version yet. I will add some more comments tomorrow and push that as new commit so we can discuss that version.
Generally I would prefer to error on the "safe" side here, so if something is not that obvious from the unsafe code guidelines better use something simpler...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As promised I pushed 28409cc which uses raw pointers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(The PR is updated as well)
let bind_collector = unsafe { | ||
std::mem::transmute::<SqliteBindCollector<'_>, SqliteBindCollector<'static>>( | ||
bind_collector, | ||
) | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While this is technically safe in the scope of the current function I really try to avoid transmute
as much as possible, even if it makes the code a bit harder to write. My issue with transmute
is that any code using transmute
may be valid now, but it's incredible hard to spot if that remains true after a refactoring, especially for persons who've not written the original code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've made sure this single line enforces that we're only transmuting the lifetime, and we're calling a function that makes use of these references as static references right after anyway, which may become just as incorrect.
I think it's reasonable to assume that changes to this file are hard enough to make that they would be made by people who have read the whole file and would be thoroughly reviewed.
That is just to put the query in Self asap, so it always gets dropped after the binds are erased, whatever happens.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are correct, doing that here is fine. I consider this to be a minor issue for now, the main point is the discussion above about aliasing.
Sorry for the late answers - I somehow missed the notifications 🙁 |
@Ten0 No worries. I left a comment about the aliasing thing above. After reading the corresponding unsafe code guideline discussions I believe we should just store raw pointers there to be sure. That's not really nice, but at least that's the variant everyone agrees about being correct. |
f89e9a5
to
baaafd1
Compare
36660de
to
47eac47
Compare