Skip to content

Conversation

@github-actions
Copy link
Contributor

Cherry-picked from #59586

…s are missing after schema evolution (#59586)

### What problem does this PR solve?
- relate pr: #57204

**Problem Summary:**
When querying struct fields in Iceberg tables after schema evolution, if
all queried struct fields are missing in old Parquet files, the code
fails with error:
```
File column name 'removed' not found in struct children
```

**Root Cause:**
When all queried struct sub-fields are missing in the old Parquet file
(e.g., newly added fields after schema evolution), the code needs to
find a reference column from the file schema to get repetition level
(RL) and definition level (DL) information. However, if the reference
column (e.g., `removed`) was dropped from the table schema, calling
`root_node->get_children_node_by_file_column_name()` will fail because
the column doesn't exist in `root_node`.

**Scenario:**
1. Create table with struct containing: `removed`, `rename`, `keep`,
`drop_and_add`
2. Insert data (creates Parquet file with these fields)
3. Perform schema evolution: DROP `a_struct.removed`, DROP then ADD
`a_struct.drop_and_add` (gets new field ID), ADD `a_struct.added`
4. Query `struct_element(a_struct, 'drop_and_add')` or
`struct_element(a_struct, 'added')` on the old file
5. The query fails because:
- All queried fields (`drop_and_add`, `added`) are missing in the old
file
- Code tries to use `removed` as reference column (it exists in file but
was dropped from table schema)
- Accessing `removed` via `root_node` fails because it doesn't exist in
table schema

### Solution:
Use `TableSchemaChangeHelper::ConstNode::get_instance()` instead of
looking up from `root_node` for the reference column. Since the
reference column is only used to get RL/DL information (not for schema
mapping), using `ConstNode` is safe and avoids the issue where the
reference column doesn't exist in `root_node`.

### Release note

None

### Check List (For Author)

- Test <!-- At least one of them must be included. -->
    - [ ] Regression test
    - [ ] Unit Test
    - [ ] Manual test (add detailed scripts or steps below)
    - [ ] No need to test or manual test. Explain why:
- [ ] This is a refactor/code format and no logic has been changed.
        - [ ] Previous test can cover this change.
        - [ ] No code files have been changed.
        - [ ] Other reason <!-- Add your reason?  -->

- Behavior changed:
    - [ ] No.
    - [ ] Yes. <!-- Explain the behavior change -->

- Does this need documentation?
    - [ ] No.
- [ ] Yes. <!-- Add document PR link here. eg:
apache/doris-website#1214 -->

### Check List (For Reviewer who merge this PR)

- [ ] Confirm the release note
- [ ] Confirm test cases
- [ ] Confirm document
- [ ] Add branch pick label <!-- Add branch pick label that this PR
should merge into -->
@github-actions github-actions bot requested a review from yiguolei as a code owner January 13, 2026 11:34
@Thearas
Copy link
Contributor

Thearas commented Jan 13, 2026

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@dataroaring dataroaring reopened this Jan 13, 2026
@Thearas
Copy link
Contributor

Thearas commented Jan 13, 2026

run buildall

@doris-robot
Copy link

BE UT Coverage Report

Increment line coverage 0.00% (0/1) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 53.46% (18875/35304)
Line Coverage 39.28% (175089/445757)
Region Coverage 33.95% (135336/398613)
Branch Coverage 34.90% (58513/167658)

@yiguolei
Copy link
Contributor

skip buildall

@yiguolei yiguolei merged commit d53be9f into branch-4.0 Jan 14, 2026
28 of 30 checks passed
@github-actions github-actions bot deleted the auto-pick-59586-branch-4.0 branch January 14, 2026 01:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants