19848: Support reverse page ordering in sort pushdown phase 1#191
19848: Support reverse page ordering in sort pushdown phase 1#191martin-augment wants to merge 6 commits intomainfrom
Conversation
- Add reverse_pages flag to ParquetSource and ParquetOpener - Wire reverse_pages through try_reverse_output() alongside reverse_row_groups - Extend try_reverse_output() to set both flags when optimizing descending sorts - Add comprehensive test coverage for reverse_pages functionality - Align with existing reverse_row_groups infrastructure pattern Phase 1 establishes the foundation for page-level reverse ordering. Actual page-level reversal implementation will be completed in Phase 2 when arrow-rs provides page-level API support. Tests: - All 27 reverse-related tests pass - Added 4 new tests for reverse_pages flag - Verified backward compatibility Closes apache#19486
This attribute suppresses the dead code warning for Phase 1 implementation. The method will be used in Phase 2 when actual page reversal is implemented.
All test snapshots now reflect that reverse_pages flag is set alongside reverse_row_groups when optimizing descending sorts.
All EXPLAIN plan outputs now reflect that reverse_pages flag is set alongside reverse_row_groups when optimizing descending sorts. Updated test files: - sort_pushdown.slt (18 instances) - topk.slt (1 instance) - create_external_table.slt (1 instance)
WalkthroughThe changes introduce a new ✨ Finishing touches
🧹 Recent nitpick comments
📜 Recent review detailsConfiguration used: Organization UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (6)
🧰 Additional context used🧬 Code graph analysis (2)datafusion/datasource-parquet/src/opener.rs (1)
datafusion/datasource-parquet/src/source.rs (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
🔇 Additional comments (15)
✏️ Tip: You can disable this entire section by setting Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Code Review: Support reverse page ordering in sort pushdown phase 1SummaryThis PR adds infrastructure for reverse page ordering in Parquet sort pushdown optimization (Phase 1). The implementation adds a ✅ Strengths1. Well-structured Phase 1 approachThe PR clearly separates infrastructure setup from actual implementation, which is a sensible engineering approach. The code includes clear documentation noting that actual page-level reversal will be implemented when arrow-rs provides the necessary API support. 2. Comprehensive test coverage
3. Consistent with existing patternsThe implementation follows the same pattern as
4. Good documentationComments clearly explain the phase 1 nature and future plans: /// Note: This is infrastructure for phase 1; actual page reversal may be implemented
/// in future phases depending on arrow-rs capabilities.
🔍 Issues & Concerns1. Potential logic issue: Setting flags independently
|
🤖 Augment PR SummarySummary: Extends sort pushdown (phase 1) for Parquet scans by introducing a new Changes:
Technical Notes: This PR appears to be “infrastructure only” for page-level reversal; query plans now advertise the flag even though actual page reversal can be implemented in a later phase (e.g., depending on arrow-rs capabilities). 🤖 Was this summary useful? React with 👍 or 👎 |
| let new_source = self | ||
| .clone() | ||
| .with_reverse_row_groups(true) | ||
| .with_reverse_pages(true); |
There was a problem hiding this comment.
reverse_pages is now enabled here and surfaced in EXPLAIN, but it doesn’t appear to be consumed in ParquetOpener yet (only reverse_row_groups affects the prepared plan), so it currently looks like a plan-only flag. Consider clarifying in the docs/comments that this doesn’t change scan order yet (also applies to other locations in the PR).
🤖 Was this useful? React with 👍 or 👎
19848: To review by AI