-
Notifications
You must be signed in to change notification settings - Fork 440
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: wrap future streams also in io_runtime #3162
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Ion Koutsouris <15728914+ion-elgreco@users.noreply.github.com>
@Tom-Newton can you give this a spin? |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3162 +/- ##
==========================================
- Coverage 72.17% 71.98% -0.20%
==========================================
Files 138 138
Lines 45282 45412 +130
Branches 45282 45412 +130
==========================================
+ Hits 32681 32688 +7
- Misses 10533 10656 +123
Partials 2068 2068 ☔ View full report in Codecov by Sentry. |
Signed-off-by: Ion Koutsouris <15728914+ion-elgreco@users.noreply.github.com>
36fccee
to
01da20b
Compare
I plan to give this a look later today |
I gave it a try, but I can definitely still reproduce |
@Tom-Newton what is your timeout set to? |
|
@Tom-Newton can you set it to a very large interval, 30 minutes or something? |
I gave it a try with 30 minutes. The success rate is not noticeably better but we do get a different error message.
Potentially this is useful information, but I have not yet had a chance to investigate further. It sounds like in this case Azure is returning an error response, which would likely be informative. |
@Tom-Newton this is during reading I presume? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to hold off on this change being merged for the time being. I need to understand the consequences of the changes to the core storage module here.
The 🍝 we have floating around to accommodate asynchronous behaviors with Python is creeping further into core here, and that gives me a lot of concern so I need to spend a fair bit more time evaluating this change
@@ -124,7 +124,7 @@ Let’s write 100 hours worth of data to the Delta table. | |||
|
|||
=== "Rust" | |||
```rust | |||
let mut table = DeltaOps::try_from_uri("observation_data") | |||
let mut table = DeltaOps::try_from_uri("observation_data", IORuntime::default()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this and the other documentation change are supposed to be wrapped in a Some
Yeah I'm fine with that, the discussion where this started from is also still ongoing: apache/datafusion#14286 (comment) so I don't think there is a full consensus on how to solve this topic yet, right @alamb? |
Yes I agree there is not yet consensus on the topic. We also don't fully understand the consequences / implications of doing these cross runtime shenanigans -- as we do that with other projects hopefully we'll come up with a good pattern that we can then share here as well |
@alamb pointed out that the get stream would be not executed on the IO runtime unless it was wrapped. So thanks to his example PR I've added this now for the other streams :) Thanks @alamb!
Also addressed some other missing areas where I should have enabled the IO runtime:
I've also made it possible to pass an optional runtime in DeltaOps::try_from_uri, since this was missing but it seems a common used entry point to open a delta table