-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: use pyo3-asyncio to get a fresh tokio runtime #2
base: main
Are you sure you want to change the base?
Conversation
Expand on description and add how we tested it. We can take this info from our internal PR that implemented this feature. |
bd00c59
to
92a3cac
Compare
python/src/lib.rs
Outdated
@@ -4,6 +4,8 @@ mod error; | |||
mod filesystem; | |||
mod schema; | |||
mod utils; | |||
extern crate pyo3; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is this crate used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can probably remove this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed
Should we have a GitHub issue in github/delta-io/delta-rs/issues for this feature request? Also how might this relate to github/delta-io/delta-rs/1315? |
92a3cac
to
6254fd7
Compare
@@ -840,22 +846,22 @@ impl RawDeltaTable { | |||
pub fn get_py_storage_backend(&self) -> PyResult<filesystem::DeltaFileSystemHandler> { | |||
Ok(filesystem::DeltaFileSystemHandler { | |||
inner: self._table.object_store(), | |||
rt: Arc::new(rt()?), | |||
rt: Arc::new(rt_pyo3()?), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DeltaFileSystemHandler
(https://github.com/delta-io/delta-rs/blob/main/python/src/filesystem.rs#L58) uses the runtime
in utils.rs (https://github.com/delta-io/delta-rs/blob/main/python/src/utils.rs#L10-L13), which is not pyo3-asyncio
. So I keep the original rt
but change name to rt_pyo3
to avoid errors.
Is there a better way to handle this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, so utils.rs and lib.rs both have a runtime that they create? I don't actually know what is the right approach here without more knowledge of runtimes and tokio. Should we get help or learn ourselves?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://sourcecode.vectra.io/projects/DP/repos/delta-rs/pull-requests/26/overview I think this provides the answer. This change seems worthy of pushing upstream, and would require a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That answers the question of why we're changing lib.rs
. However, my question is related specifically to our having two runtime functions: should we also be using py03-asyncio's tokio-runtime in utils.rs
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's get this out and have the community decide. Frankly @tsh56 and I have higher-priority issues to resolve.
I think it's related to delta-io#1315. |
yeah and especially, felt like we did implement alternative to delta-io#1361 |
do we know why some tests are failing? |
|
556c778
to
46d4f7b
Compare
4ec7485
to
f71515c
Compare
f71515c
to
ccfccc7
Compare
Description
This PR greatly reduces network connections and dns request volume by the delta-rs library when using Python bindings. The approach here is to utilize pyo3-asyncio's tokio-runtime feature as the source of the Runtime. This yields the same runtime across function calls which preserves connections in the connection pool. The previous code created a new runtime per python function call, which established all new socket connections and issued new DNS requests.
Related Issue(s)
Partly delta-io#1315
Testing:
Ran a script that called hundreds of delta operations, and watched tcpdump. Only saw one dns request.
Documentation