-
Notifications
You must be signed in to change notification settings - Fork 239
Provide differert read interface for reader #1047
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
How do you think? cc @liurenjie1024 @Xuanwo @Fokko @sdd |
Hi, I believe that's related to #1036 |
Seems like a reasonable idea to me. If my 5 open PRs for delete file read support get reviewed and merged then implementing what you need would be pretty trivial on top of them :-) |
Thanks @ZENOTME for raising this. I think what's missing is a
This reader need to convert files(parquet, orc, avro) into arrow record batch, which handles things like missing column, type promotion, etc, which are caused by schema evolution. With this api, it would be easy to implement the |
In this design, does
And for delete file(pos delete, equality delete), do we need to handle things like missing column, type promotion? 🤔 Seems for pos delete and eq delete without value, we can't fulfill the value if they miss. So in here we may need the |
Is your feature request related to a problem or challenge?
For now, our arrow reader accepts the FileScanTask and returns the RecordBatchStream to the user. After #630, the reader can process the delete file and merge it with the data file, which it's good to ready to use out of the box. However, for some compute engines, they hope to process delete file by themselves so that they can utilize the existing join executor and storage to spill the data. This require to read the delete file directly rather than process the delete file internally.
Based on this, I suggest providing different read interface so that it satisfy different requirement:
Describe the solution you'd like
No response
Willingness to contribute
The text was updated successfully, but these errors were encountered: