[feature](reader) Optimize Complex Type Column Reading with Column Puning #59249

kaka11chen · 2025-12-22T07:10:36Z

What problem does this PR solve?

Problem Summary:

Release note

Cherry-pick #57204 #58719

Check List (For Author)

Test
- Regression test
- Unit Test
- Manual test (add detailed scripts or steps below)
- No need to test or manual test. Explain why:
  - This is a refactor/code format and no logic has been changed.
  - Previous test can cover this change.
  - No code files have been changed.
  - Other reason
Behavior changed:
- No.
- Yes.
Does this need documentation?
- No.
- Yes.

Check List (For Reviewer who merge this PR)

Confirm the release note
Confirm test cases
Confirm document
Add branch pick label

hello-stephen · 2025-12-22T07:10:43Z

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

What problem was fixed (it's best to include specific error reporting information). How it was fixed.
Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
What features were added. Why was this function added?
Which code was refactored and why was this part of the code refactored?
Which functions were optimized and what is the difference before and after the optimization?

kaka11chen · 2025-12-22T08:00:28Z

run buildall

hello-stephen · 2025-12-22T08:29:50Z

Cloud UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category	Coverage
Function Coverage	82.14% (1573/1915)
Line Coverage	67.16% (28070/41794)
Region Coverage	67.68% (13806/20399)
Branch Coverage	58.13% (7363/12666)

hello-stephen · 2025-12-22T09:05:28Z

FE UT Coverage Report

Increment line coverage 61.68% (850/1378) 🎉
Increment coverage report
Complete coverage report

kaka11chen · 2025-12-22T09:55:41Z

run buildall

kaka11chen · 2025-12-22T10:08:37Z

run buildall

doris-robot · 2025-12-22T10:31:32Z

Cloud UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category	Coverage
Function Coverage	82.14% (1573/1915)
Line Coverage	67.16% (28067/41794)
Region Coverage	67.68% (13807/20399)
Branch Coverage	58.14% (7364/12666)

kaka11chen · 2025-12-22T14:46:34Z

run buildall

hello-stephen · 2025-12-22T15:18:26Z

Cloud UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category	Coverage
Function Coverage	82.14% (1573/1915)
Line Coverage	67.17% (28071/41794)
Region Coverage	67.72% (13815/20399)
Branch Coverage	58.15% (7365/12666)

hello-stephen · 2025-12-22T16:03:27Z

FE UT Coverage Report

Increment line coverage 61.68% (850/1378) 🎉
Increment coverage report
Complete coverage report

kaka11chen · 2025-12-22T17:10:31Z

run buildall

doris-robot · 2025-12-22T17:43:56Z

Cloud UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category	Coverage
Function Coverage	82.14% (1573/1915)
Line Coverage	67.16% (28070/41794)
Region Coverage	67.66% (13802/20399)
Branch Coverage	58.09% (7358/12666)

…uning (apache#57204) Problem Summary: Optimize Complex Type Column Reading with Column Pruning This PR implements column pruning for complex types (Struct, Array, Map) to optimize read performance. Previously, Doris would read entire complex type fields before processing, which was simple to implement but inefficient when only specific sub-columns were needed. **Key changes:** - **FE (Frontend)**: Added column access path calculation and type pruning - Collects and analyzes access paths for complex type fields - Performs type pruning based on access paths - Implements projection pushdown for complex types - **BE (Backend)**: Added selective column reading - Uses columnAccessPath array from FE to identify required sub-columns - Implements selective reading to skip unnecessary sub-columns **Performance Improvement**: When a struct contains hundreds or thousands of columns but the query only accesses a few sub-columns, this optimization can significantly reduce I/O and improve query performance. For example, with `struct<int a, int b> s`, when only `s.a` is referenced, we can avoid reading `s.b` entirely. **Technical Benefits**: Reduces unnecessary data scanning and decoding overhead for complex types, aligning with Doris's continuous performance optimization goals . - **Lazy Materialization for Complex Type Sub-columns**: Defer materialization of unused sub-columns - **Predicate Pushdown for Complex Type Sub-columns**: Push predicates to storage layer for better filtering - **Parquet RL/DL Optimization**: Read only repetition levels and definition levels without data in appropriate scenarios - **Array Size Optimization**: Read only offset and null values for `array_size()` operations - **Null Check Optimization**: Read only offset and null values for `!= null` checks Co-authored-by: 924060929 <lanhuajian@selectdb.com> Co-authored-by: Jerry Hu <mrhhsg@gmail.com>

kaka11chen · 2025-12-23T03:58:51Z

run buildall

doris-robot · 2025-12-23T04:31:48Z

Cloud UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category	Coverage
Function Coverage	82.14% (1573/1915)
Line Coverage	67.19% (28080/41794)
Region Coverage	67.75% (13821/20399)
Branch Coverage	58.17% (7368/12666)

doris-robot · 2025-12-23T05:14:20Z

BE UT Coverage Report

Increment line coverage 64.37% (1104/1715) 🎉

Increment coverage report
Complete coverage report

Category	Coverage
Function Coverage	53.34% (18605/34877)
Line Coverage	39.12% (172357/440602)
Region Coverage	33.76% (133050/394105)
Branch Coverage	34.78% (57595/165579)

kaka11chen requested a review from yiguolei as a code owner December 22, 2025 07:10

kaka11chen force-pushed the cherry-pick-57204_4.0 branch from ec16ea5 to 0acc3aa Compare December 22, 2025 07:58

kaka11chen marked this pull request as draft December 22, 2025 08:00

kaka11chen force-pushed the cherry-pick-57204_4.0 branch from 0acc3aa to 364e09f Compare December 22, 2025 09:55

kaka11chen force-pushed the cherry-pick-57204_4.0 branch from 364e09f to b6db778 Compare December 22, 2025 10:08

kaka11chen force-pushed the cherry-pick-57204_4.0 branch from b6db778 to b8cb3da Compare December 22, 2025 14:40

kaka11chen force-pushed the cherry-pick-57204_4.0 branch from b8cb3da to e270345 Compare December 22, 2025 14:46

kaka11chen force-pushed the cherry-pick-57204_4.0 branch from e270345 to bd4d163 Compare December 22, 2025 17:10

kaka11chen force-pushed the cherry-pick-57204_4.0 branch from bd4d163 to f4e84e7 Compare December 23, 2025 03:54

kaka11chen closed this Dec 24, 2025

[feature](reader) Optimize Complex Type Column Reading with Column Puning #59249

[feature](reader) Optimize Complex Type Column Reading with Column Puning #59249

Uh oh!

Conversation

kaka11chen commented Dec 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What problem does this PR solve?

Release note

Check List (For Author)

Check List (For Reviewer who merge this PR)

Uh oh!

hello-stephen commented Dec 22, 2025

Uh oh!

kaka11chen commented Dec 22, 2025

Uh oh!

hello-stephen commented Dec 22, 2025

Cloud UT Coverage Report

Uh oh!

hello-stephen commented Dec 22, 2025

FE UT Coverage Report

Uh oh!

kaka11chen commented Dec 22, 2025

Uh oh!

kaka11chen commented Dec 22, 2025

Uh oh!

doris-robot commented Dec 22, 2025

Cloud UT Coverage Report

Uh oh!

kaka11chen commented Dec 22, 2025

Uh oh!

hello-stephen commented Dec 22, 2025

Cloud UT Coverage Report

Uh oh!

hello-stephen commented Dec 22, 2025

FE UT Coverage Report

Uh oh!

kaka11chen commented Dec 22, 2025

Uh oh!

doris-robot commented Dec 22, 2025

Cloud UT Coverage Report

Uh oh!

kaka11chen commented Dec 23, 2025

Uh oh!

doris-robot commented Dec 23, 2025

Cloud UT Coverage Report

Uh oh!

doris-robot commented Dec 23, 2025

BE UT Coverage Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

kaka11chen commented Dec 22, 2025 •

edited

Loading