Closed
Description
This is my plan this week for reviews, etc. I am putting it here to make it visible and keep myself organized
- DataFusion: review spark functions feat: Add
datafusion-spark
crate #15168 from @shehabgaminDataFusion review partition statistics PR from @xudong963 , there is a newer PR about statistics API:
Feat: introduceExecutionPlan::partition_statistics
API #15852DataFusion / statistics: Map file-level column statistics to the table-level #15865 from @xudong963DataFusion Bug: [DISCUSSION] Sorts being removed from subqueries #15886arrow: file ticket about boolean based row selection: [Parquet] Add BooleanArray based row selection arrow-rs#6624 -- and discord discussion; https://discord.com/channels/885562378132000778/1363995762182193373/1366410521066078349arrow filter pushdown: review existing PRs and file organizational ticketsarrow filter pushdown find benchmark discrepancy: arrow_reader_row_filter benchmark doesn't capture page cache improvements arrow-rs#7460sqlparser -- prepare release: Release sqlparser-rs version0.56.0
around 2024-04-20 datafusion-sqlparser-rs#1756DataFusion: Spark Merge feat: Adddatafusion-spark
crate #15168 and file follow on organizational epicDataFusion: review filter pushdown APIs: refactor filter pushdown apis #15801DataFusion Dynamic Filter pushdown: Implement Parquet filter pushdown via new filter pushdown APIs #15769Arrow Variant: Apply feedback to Add example binary variant data and regeneration scripts parquet-testing#76Arrow Variant: Review Creation API: Add API for Creating Variant Values arrow-rs#7452 from @PinkCrow007DataFusion: aggregate performance PR from @Rachelint Intermediate result blocked approach to aggregation memory management #15591Arrow filter pushdown: bitmap / range: Poc for adaptive parquet predicate pushdown(bitmap/range) with page cache(3 data pages) arrow-rs#7454object_store: fix/merge thread pool PR feat: AddSpawnService
andSpawnedReqwestConnector
for running requests on a different runtime arrow-rs-object-store#332Arrow Variant: Createparquet-variant
create skeleton PR and basic reader APIArrow Variant: Expose tape decoder Add custom decoder in arrow-json arrow-rs#7442DataFusion perf script from @logan-keede : Shell script to collect benchmarks for multiple versions #15144DataFusion perf script draft: feat(benchmark): collect benchmarks for last 5 versions in line protocol format #15846DataFusion: Min/Max for lists/ nested types: feat(datafusion-functions-aggregate): add support for lists and other nested types in min and max #15857DataFusion PR about pruning ordering: pipe column orderings into pruning predicate creation #15821
Nice to have (really would be great to have someone help review):
- DataFusion: Aggregate UDFs in FFI: feat: Add Aggregate UDF to FFI crate #14775Arrow: Avro cleanup: Avro codec enhancements arrow-rs#6965Arrow: Avro Utf8View: Support Utf8View for Avro arrow-rs#7434
Activity
alamb commentedon May 5, 2025
Will continue tracking next week: