Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: use pyo3-asyncio to get a fresh tokio runtime #2

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

PengLiVectra
Copy link
Collaborator

@PengLiVectra PengLiVectra commented Nov 8, 2023

Description

This PR greatly reduces network connections and dns request volume by the delta-rs library when using Python bindings. The approach here is to utilize pyo3-asyncio's tokio-runtime feature as the source of the Runtime. This yields the same runtime across function calls which preserves connections in the connection pool. The previous code created a new runtime per python function call, which established all new socket connections and issued new DNS requests.

Related Issue(s)

Partly delta-io#1315

Testing:

Ran a script that called hundreds of delta operations, and watched tcpdump. Only saw one dns request.

Documentation

@ginevragaudioso
Copy link

Expand on description and add how we tested it. We can take this info from our internal PR that implemented this feature.

@@ -4,6 +4,8 @@ mod error;
mod filesystem;
mod schema;
mod utils;
extern crate pyo3;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this crate used?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can probably remove this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

@syedashrafulla
Copy link

Should we have a GitHub issue in github/delta-io/delta-rs/issues for this feature request?

Also how might this relate to github/delta-io/delta-rs/1315?

@@ -840,22 +846,22 @@ impl RawDeltaTable {
pub fn get_py_storage_backend(&self) -> PyResult<filesystem::DeltaFileSystemHandler> {
Ok(filesystem::DeltaFileSystemHandler {
inner: self._table.object_store(),
rt: Arc::new(rt()?),
rt: Arc::new(rt_pyo3()?),
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DeltaFileSystemHandler (https://github.com/delta-io/delta-rs/blob/main/python/src/filesystem.rs#L58) uses the runtime in utils.rs (https://github.com/delta-io/delta-rs/blob/main/python/src/utils.rs#L10-L13), which is not pyo3-asyncio. So I keep the original rt but change name to rt_pyo3 to avoid errors.
Is there a better way to handle this?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, so utils.rs and lib.rs both have a runtime that they create? I don't actually know what is the right approach here without more knowledge of runtimes and tokio. Should we get help or learn ourselves?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://sourcecode.vectra.io/projects/DP/repos/delta-rs/pull-requests/26/overview I think this provides the answer. This change seems worthy of pushing upstream, and would require a separate PR.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That answers the question of why we're changing lib.rs. However, my question is related specifically to our having two runtime functions: should we also be using py03-asyncio's tokio-runtime in utils.rs?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's get this out and have the community decide. Frankly @tsh56 and I have higher-priority issues to resolve.

@PengLiVectra
Copy link
Collaborator Author

Should we have a GitHub issue in github/delta-io/delta-rs/issues for this feature request?

Also how might this relate to github/delta-io/delta-rs/1315?

I think it's related to delta-io#1315.

@dsandesari
Copy link
Collaborator

Should we have a GitHub issue in github/delta-io/delta-rs/issues for this feature request?
Also how might this relate to github/delta-io/delta-rs/1315?

I think it's related to delta-io#1315.

yeah and especially, felt like we did implement alternative to delta-io#1361

@ginevragaudioso
Copy link

do we know why some tests are failing?

@dsandesari
Copy link
Collaborator

do we know why some tests are failing?
due to linting issues, probably can be fixed by syncing with delta-io/delta-rs or use suggestions based on build failure report here: https://github.com/vectra-ai-research/delta-rs/actions/runs/6926886646/job/18839764113?pr=2

@PengLiVectra PengLiVectra force-pushed the use-pyo3-asynocio branch 2 times, most recently from 556c778 to 46d4f7b Compare November 28, 2023 15:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants