-
Notifications
You must be signed in to change notification settings - Fork 205
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
replace custom ParquetSstReader to DataFusion's ParquetExec #291
Comments
Seems this issue is relative to: In #256, we want to implement filter logics with datafusion's expr. If we replace |
@ygf11 yeah, those issue are related, but they work in different levels. In current implementation,
|
Thanks, I almost got it. If I have mistake, please take it out. First, We will read data from two places, Predicate works before pass to MergeIterator:
Further more, after #256 and this task, we can remove |
There are some mistakes here:
This is true, but one thing worth mentioning that the data is not ensured to meet the requirements by pushed down predicates (actually the filter is implemented by utilizing the min-max index on the row-group level.)
So the further filter works for not only the memtables but also the data from sst files. In conclusion, both #256 and #291 are necessary. |
Thanks for figuring it out. It is clear now :D. |
Already fixed in main. |
Describe This Problem
In current implementation
ParquetSstReader
use CacheableSerializedFileReader to read parquet file, andCacheableSerializedFileReader
is modeled after https://github.com/apache/arrow-rs/blob/5.2.0/parquet/src/file/serialized_reader.rs, which contains lots of low-level details about how to parse parquet, and there are already some issues with it, such as When upgrade to parquet 23, some usages are deprecated. #271Proposal
By leverage DataFusion's ParquetExec to implement this, we can remove all those low-level details and it's already feature rich, for example:
Additional Context
IOx already does this.
The text was updated successfully, but these errors were encountered: