Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose sqlite's sqlite3_create_collation() #2495

Closed
wants to merge 7 commits into from

Conversation

z33ky
Copy link
Contributor

@z33ky z33ky commented Sep 4, 2020

This PR adds diesel::sqlite::SqliteConnection::register_collation(), which wraps sqlite3_create_collation_v2() (ref https://sqlite.org/c3ref/create_collation.html).
The registered collations can be used via the COLLATE-constraint on table columns, as is demonstrated in the tests.

let f = match user_ptr.as_mut() {
Some(f) => f,
None => {
//FIXME
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not really sure what to do here if sqlite passes us null pointers. Can it just be assumed that this will not happen (i.e. unwrap() here)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me unreachable! in case of None could be appropriate.

unreachable!(
"We've written the aggregator above to that location, it must be there"
)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It wouldn't be, since "A panic! across an FFI boundary is undefined behavior." (https://doc.rust-lang.org/nomicon/ffi.html#ffi-and-panics).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to https://www.sqlite.org/c3ref/create_collation.html we can be sure that we get our pouter to the callback here. Following that reasoning I would use unreachable! or something similar + a comment explaining why this cannot happen to indicate that this is guaranteed by other things.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then I'd either go to "full undefined behavior" territory and use std::hint::unreachable_unchecked(), or stay on the safe route and call std::process::abort() (with an eprintln!() in front).
IMO unreachable!() gives the impression that we can just panic and it's fine. Which it may eventually be (rust-lang/rust#74990), but at the moment it is not.

Copy link
Member

@weiznich weiznich Sep 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using comment +eprintln! + abort() sounds like the better solution for me. Can you have a quick lock at the other callbacks for normal and aggregate functions and change that there as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, will do!

@z33ky
Copy link
Contributor Author

z33ky commented Sep 4, 2020

I should mention, my use-case is allowing for proper case-insensitive matching, since the NOCASE-collation available with sqlite only works with ASCII (ref §7 https://www.sqlite.org/datatype3.html).
Perhaps this use-case can be mentioned in the method-documentation?

@weiznich weiznich requested a review from a team September 5, 2020 02:22
Copy link
Member

@weiznich weiznich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general I'm in favor of this change. I've left some comments at places where I think things can be improved. Additionally such a change requires an entry to our changelog.

diesel/src/sqlite/connection/functions.rs Outdated Show resolved Hide resolved
diesel/src/sqlite/connection/mod.rs Outdated Show resolved Hide resolved
diesel/src/sqlite/connection/mod.rs Outdated Show resolved Hide resolved
diesel/src/sqlite/connection/mod.rs Outdated Show resolved Hide resolved
diesel/src/sqlite/connection/mod.rs Outdated Show resolved Hide resolved
diesel/src/sqlite/connection/raw.rs Outdated Show resolved Hide resolved
let f = match user_ptr.as_mut() {
Some(f) => f,
None => {
//FIXME
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to https://www.sqlite.org/c3ref/create_collation.html we can be sure that we get our pouter to the callback here. Following that reasoning I would use unreachable! or something similar + a comment explaining why this cannot happen to indicate that this is guaranteed by other things.

diesel/src/sqlite/connection/raw.rs Outdated Show resolved Hide resolved
@z33ky
Copy link
Contributor Author

z33ky commented Sep 8, 2020

sqlite3_create_collation() can also remove collation function by passing NULL as callback. Shall I add a SqliteConnection::unregister_collation() for this?

@weiznich
Copy link
Member

weiznich commented Sep 9, 2020

sqlite3_create_collation() can also remove collation function by passing NULL as callback. Shall I add a SqliteConnection::unregister_collation() for this?

The question is how often would such a function be used?

@z33ky
Copy link
Contributor Author

z33ky commented Sep 9, 2020

The question is how often would such a function be used?

It guess it would only be of potential use when dropping tables or some such. It certainly won't find use from me (at least at the moment).
Can I gather from the inquisition that I shell leave it absent then?

@z33ky
Copy link
Contributor Author

z33ky commented Sep 9, 2020

I hit a snag while making run_aggregator_step_function() unwind-safe: SqliteAggregateFunction::step() takes a &mut self. The general idea is that when step() panics, little to no care may be taken that self restores temporary invalid state (see https://doc.rust-lang.org/std/panic/trait.UnwindSafe.html#what-is-unwindsafe).
We can communicate that we expect the SqliteAggregateFunction to be UnwindSafe, but apparently &mut T is never considered UnwindSafe, even if T: UnwindSafe, (I'm guesing) since the object continues to exist after the panic.

We can wrap it in a AssertUnwindSafe, which means that we assure the type system about SqliteAggregateFunction's continue existence after panics on the user's behalf, or we can std::process::abort() as well if it panics.
We could add our own marker trait which the user has to implement. Implementing this trait would be a promise that it's fine to wrap in AssertUnwindSafe for the purpose of calling step().

@weiznich
Copy link
Member

I think AssertUnwindSafe + a comment mentioning that the implementation is responsible for cleaning up stuff after a panic should be fine there.

@z33ky
Copy link
Contributor Author

z33ky commented Sep 11, 2020

That should address all requested changes.
On top it also does some light refactoring of the aggregator function handling.
I don't know if reducing the scope of unsafe is desired since it adds a bit of "noise" to open a few more, but smaller, unsafe scopes than before. That's its own commit though and I can easily remove it if not desired.

Copy link
Member

@weiznich weiznich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks really good, modulo some small stylistic things. Thanks for working on this 👍

diesel/src/sqlite/connection/raw.rs Outdated Show resolved Hide resolved
diesel/src/sqlite/connection/raw.rs Outdated Show resolved Hide resolved
}
let args = unsafe { slice::from_raw_parts(value_ptr, num_args as _) };
let args = build_sql_function_args::<ArgsSqlType, Args>(args);
let mut aggregator = std::panic::AssertUnwindSafe(aggregator);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had some time to think a bit more about this, so here some reasoning why this actually should be OK and users do not need to care about this:

  • We catch the panic and get a Err
  • We return an error message to sqlite, so that sqlite will abort executing this aggregate, so the old instance will not be used anymore.
  • On the next call to this aggregate function we will create a new instance of A, so we cannot have compromised state here either.

(Maybe add that as comment here)

From that point of view: I'm not sure if we really need to have bounds on UnwindSafe on the corresponding types.

Copy link
Contributor Author

@z33ky z33ky Sep 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I didn't really follow how the aggregator is initialized.
Unwind-safetly is still a useful property though because finalize() will be called whether or not step() errored out.

I'm not certain if having the UnwindSafe-bound actually does much in any case, since this AssertUnwindSafe is where the unwind-safety rubber hits the road. Though without it we'd need another AssertUnwindSafe-wrapping in run_aggregator_final_function() if the bound does not exist. And that may be relevant Drop impementations, even if we don't use the aggregator any further.

diesel/src/sqlite/connection/raw.rs Show resolved Hide resolved
/// The `step()` method is called once for every record of the query.
///
/// This is called through a C FFI, as such panics do not propagate to the caller. Panics are
/// caught and cause a return with an error value. The implementation must still ensure that
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As explained above I think we do not need to have that actually here, because I see no way how the state could be reused here.

diesel/src/sqlite/connection/raw.rs Outdated Show resolved Hide resolved
Panics across FFI boundaries cause undefined behavior in current Rust.
The aggregator functions are callbacks invoked from C (libsqlite).
To safe-guard against panics, std::panic::catch_unwind() is used. On
panic the functions now return with an error result indicating the
unexpected panic occurred.

std::panic::catch_unwind() requires types to implement
std::panic::UnwindSafe, a marker trait indicating that care must be
taken since panics introduce control-flow that is not very visible.
Refer to https://doc.rust-lang.org/std/panic/trait.UnwindSafe.html for a
more detailed explanation.
For SqliteAggregatorFunction::step() we must use
std::panic::AssertUnwindSafe, since &mut references are never considered
UnwindSafe, and the requirement to ensure unwind-safety is documented on
the method.
Of note is that in safe Rust, even if the method is not unwind-safe
the language still guarantees memory-safety. The marker trait is mainly
to prevent logic bugs.
A: SqliteAggregateFunction<Args, Output = Ret>
+ 'static
+ Send
+ std::panic::UnwindSafe
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed it to both RefUnwindSafe and UnwindSafe, since ::step() passes by reference and ::terminate() by value. As stated before, I'm not sure if the RefUnwindSafe-bound for ::step() provides any practical safety since we just use AssertUnwindSafe instead to pass the mut& across and RefUnwindSafe is only for non-mut references.

This encompasses the pattern of eprintln!() + std::process::abort() and
adds a request to open an issue if the message is observed.
This is essentially the same treatment that custom aggregate functions
got in ee2f792.
@z33ky
Copy link
Contributor Author

z33ky commented Sep 14, 2020

The failed CI job looks to me like a transient failure; it seems MySQL didn't start up correctly there.

@@ -204,7 +251,8 @@ extern "C" fn run_custom_function<F>(
) where
F: FnMut(&RawConnection, &[*mut ffi::sqlite3_value]) -> QueryResult<SerializedValue>
+ Send
+ 'static,
+ 'static
+ std::panic::RefUnwindSafe,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this is a FnMut and not a Fn I'm would guess that RefUnwindSafe is not the right trait here. We do potentially mutate F here.

}
Err(_) => {
let msg = format!("{} panicked", std::any::type_name::<F>());
unsafe { context_error_str(ctx, &msg) };
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason why we need a nested unsafe block here?

context_error_str(ctx, NULL_AG_CTX_ERR)
}

unsafe fn context_error_str(ctx: *mut ffi::sqlite3_context, error: &str) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should cite https://sqlite.org/c3ref/result_blob.html here, stating that sqlite will copy our error message, so it's fine to free it afterwards. (Just to make that clear for any future person looking at this code.)
Also we maybe can take a NonNull<ffi::sqlite3_context> here to make this function not unsafe to use anymore?

A: SqliteAggregateFunction<Args, Output = Ret> + 'static + Send,
Args: FromSqlRow<ArgsSqlType, Sqlite>,
A: SqliteAggregateFunction<Args, Output = Ret> + 'static + Send + std::panic::RefUnwindSafe,
Args: FromSqlRow<ArgsSqlType, Sqlite> + std::panic::UnwindSafe,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why the Args need to be UnwindSafe. They get reconstructed everytime the aggregator is called, so a potential panic cannot corrupt anything there.

A: SqliteAggregateFunction<Args, Output = Ret> + 'static + Send,
Args: FromSqlRow<ArgsSqlType, Sqlite>,
A: SqliteAggregateFunction<Args, Output = Ret> + 'static + Send + std::panic::UnwindSafe,
Args: FromSqlRow<ArgsSqlType, Sqlite> + std::panic::UnwindSafe,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not even construct the arguments here, so UnwindSafe is not required here.

F: FnMut(&RawConnection, &[*mut ffi::sqlite3_value]) -> QueryResult<SerializedValue>
+ Send
+ 'static,
F: Fn(&str, &str) -> std::cmp::Ordering + Send + 'static,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be restricted to RefUnwindSafe as we cannot mutate the closure.

)
};

// It doesn't matter if f is UnwindSafe, since we abort on panic.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think similar to the sql function handler we should just return an error here, instead of abort the program.

@weiznich
Copy link
Member

This looks almost good, noted some minor improvements otherwise I'm fine with this now.

@weiznich
Copy link
Member

weiznich commented Nov 2, 2020

@TaKO8Ki This is another candidate for rebasing + fixing the remaining comments if your are interested

@TaKO8Ki
Copy link
Member

TaKO8Ki commented Nov 2, 2020

@weiznich
I'm going to tackle this!

@weiznich
Copy link
Member

Likely relevant: rusqlite/rusqlite#839

@weiznich
Copy link
Member

Closed by #2564

@weiznich weiznich closed this Apr 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants