-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change back to non async interface for try_into_logical_plan #3955
Conversation
It's caused by #3907. I think there's some discussion before for changing the async interface to non async. I'm wondering why it's changed back to async again in #3907. Hi @alamb, @andygrove and @avantgardnerio, what do you think of this? |
8d48adf
to
7fed871
Compare
@yahoNanJing I am not familiar with the discussion (or don't recall)... do you have a link to the PR or github issue?
Because creating TableProviders may have to be an async operation for ones like Deltalake that need to go load schema from the network. I looked into the alternative: having two methods on
Unfortunately, it did not look trivial to serialize all the state that a |
Thanks @avantgardnerio for the explanation. In case of network usage or other IO usage, we can refer to the object store interface implementation with spawn blocking. |
This is the PR in question. This is the deserialization line of code that calls the async method. If you know another way to do this, I would be open to specific recommendations. I don't understand how spawn blocking would help, but I am open to it. The only other option I was able to think of is to serialize the whole state of the |
Yeah, the @yahoNanJing would an alternative approach be to create a PR in Ballista to update the code in question to be async? @avantgardnerio perhaps @yahoNanJing is suggesting something like https://docs.rs/tokio/1.21.2/tokio/runtime/struct.Runtime.html#method.block_on which blocks a thread and waits for an async method to resolve |
FWIW, I have already done this, I just haven't filed a PR yet. I can do that easily if we decide to stick with async. |
@yahoNanJing I can't tell from this PR or #3954 about why you are concerned about this change. Specifically does Can you please let us know what you would prefer
Note that @andygrove filed #3957 recently to help with this sort of discussion |
Hi @avantgardnerio and @alamb, I think it's a similar case for the
Currently there's no blocking issue for the async change to ballista. But I'm not sure whether it will bring issues in the future. From my point of view, generally, the interface of creating the logical plan should be serialize API. For some edge case, we can use some workaround like https://docs.rs/tokio/1.21.2/tokio/runtime/struct.Runtime.html#method.block_on does. Maybe we can involve more guys for the API design, like @tustvold, @andygrove, etc. |
I have the unfortunate tendency to agree with @yahoNanJing here... see fun examples like the log4j CVE and Java's famous URL.equals() examples of abuse of network access in parsing logic. That's why I originally spent so much time working on the separate It was very convenient to switch this to async, but from a principled perspective I think it's the wrong move. I can work @houqp and the folks over at Since that PR is already working though, it would be nice to have some time to verify the assumption that the issue is resolvable before merging this PR. Thoughts @andygrove and @alamb ? |
@@ -86,5 +86,5 @@ pub trait TableProvider: Sync + Send { | |||
#[async_trait] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can probably remove #[async_trait]
as well
I'm not intimately familiar with the details here, but when given a choice between async or not, I almost always favour the latter. 😅. Certainly the idea of deserialization logic making network calls a little bit surprising. If it is possible to achieve the end result without making this async, that seems like a better path to me |
@yahoNanJing and everyone, I'd like to close this PR in favor of #3978 which will do the same thing, but not break deltalake. As long as we get it merged soon (before 14?) I think no harm was done, right? |
So basically making an API |
Thanks @avantgardnerio. I agree to close this PR if #3978 works for both you and non-async 😄 |
#3978 was merged so closing this one |
Which issue does this PR close?
Closes #3954.
Rationale for this change
What changes are included in this PR?
Are there any user-facing changes?