-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use OnceLock to store TokioRuntime #895
Use OnceLock to store TokioRuntime #895
Conversation
6fa05ba
to
ff1e758
Compare
Because the change was performance related, we should probably benchmark the change. I did a quick & dirty one (making no effort to isolate resources / stop other processes on my laptop), but the results tell me I should run a cleaner one tomorrow before merging. Using
|
From the benchmarking above it appears to be a mostly negative impact. Am I reading that right? |
Yes. My ~dirty benchmark had a pretty consistently negative performance impact. Hence, the need for further investigation. That result surprises me, though. |
18c6098
to
030ec20
Compare
Alright, second benchmark running after clean boot look like this has basically no impact. I wouldn't mind if @andygrove or anyone else has a nice benchmarking server setup to run their own, but I'm more comfortable with this now. Reminder:
|
efd2046
to
37d3356
Compare
37d3356
to
85f3ca3
Compare
I also included a reference comment in case future users experience problems with using datafusion-python behind a forking app server l ike `gunicorn`.
Ran it one more time with the
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @Michael-J-Ward !
// If we run into that problem, in the future we can look to `delta-rs` | ||
// which adds a check in that disallows calls from a forked process | ||
// https://github.com/delta-io/delta-rs/blob/87010461cfe01563d91a4b9cd6fa468e2ad5f283/python/src/utils.rs#L10-L31 | ||
static RUNTIME: OnceLock<TokioRuntime> = OnceLock::new(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if I'm understanding it correctly (please correct me if I'm not making sense).
So now it use OnceLock<Runtime>
than OnceLock<Arc<Runtime>>
to get rid of Python heap thoroughly. And it will not be thread-safe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recommend looking at the PR diff to see the real change.
There is an intermediate commit that went from OnceLock<Arc<Runtime>>
to OnceLock<Runtime>
, but that's because the Arc
was superfluous.
OnceLock
is already Send + Sync
and explicitly thread-safe.
So, if you discover that it is not actually thread-safe, please share!
Which issue does this PR close?
Closes #737.
Rationale for this change
To avoid repeatedly recreating the tokio runtime, #341 began storing the tokio runtime on the python heap at
datafusion.runtime
.This created the requirement that the
datafusion
python package must be available to any user of the rust cratedatafusion-python
.However, some users (#737) want to leverage the rust-crate
datafusion-python
to create their own custom python package without depending on or using the pythondatafusion
package.What changes are included in this PR?
This PR uses
std::sync::OnceLock
instead of the python heap to ensure that only one tokio runtime gets created once.This both removes the implicit dependency and avoids
rust --> python --> rust
roundtrip every time the runtime is used.Are there any user-facing changes?
datafusion.runtime
has been removed, but that was already unusable for users of the python package.