Skip to content

Conversation

@liaoxin01
Copy link
Contributor

Copilot AI review requested due to automatic review settings January 8, 2026 13:45
@liaoxin01 liaoxin01 requested a review from yiguolei as a code owner January 8, 2026 13:45
@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@liaoxin01 liaoxin01 marked this pull request as draft January 8, 2026 13:45
@liaoxin01
Copy link
Contributor Author

run buildall

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR cherry-picks packed file-related pull requests to branch-4.0, introducing a comprehensive packed file management system for cloud mode in Apache Doris. The changes add support for merging small files into larger "packed files" to optimize storage efficiency in cloud environments.

Key changes:

  • New packed file metadata structures and storage in protobuf definitions
  • Recycler logic to manage packed file lifecycle (correction, deletion, reference counting)
  • Checker logic to validate packed file integrity
  • Meta service RPC handler for updating packed file information
  • Extensive test coverage for packed file operations
  • Configuration parameters for tuning packed file behavior

Reviewed changes

Copilot reviewed 93 out of 93 changed files in this pull request and generated no comments.

Show a summary per file
File Description
gensrc/proto/olap_file.proto Added PackedSliceLocationPB message and packed_slice_locations map to rowset metadata
gensrc/proto/cloud.proto Added PackedSlicePB, PackedFileInfoPB, PackedFileFooterPB messages and update_packed_file_info RPC
cloud/src/recycler/recycler.{h,cpp} Core packed file recycling logic including ref count management and deletion
cloud/src/recycler/checker.{h,cpp} Packed file integrity checker to detect leaks and inconsistencies
cloud/src/meta-service/meta_service.{h,cpp} RPC handler for updating packed file metadata
cloud/src/meta-store/keys.{h,cpp} Key encoding/decoding for packed file metadata storage
cloud/src/common/{config.h,bvars.{h,cpp}} Configuration parameters and metrics for packed files
cloud/test/recycler_test.cpp Comprehensive unit tests for packed file recycling
cloud/test/meta_service_test.cpp Unit tests for update_packed_file_info RPC
regression-test/suites/load_p0/stream_load/test_packed_file_stream_load_case*.groovy Integration tests for packed file with stream load
regression-test/suites/cloud_p0/packed_file/*.groovy Integration tests for various packed file scenarios
be/test/* Updated test files to use new index file writer API (begin_close/finish_close)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

liaoxin01 and others added 10 commits January 9, 2026 12:41
…erhead (apache#57770)

Co-authored-by: Luwei <luwei@selectdb.com>
Co-authored-by: Luwei <814383175@qq.com>
…files (apache#58839)

Related PR: apache#57770

Problem Summary:
The main change is to offload the entire delete bitmap calculation
process—including file closing, rowset building, segment loading, and
bitmap calculation—to a background thread pool. This prevents blocking
the memtable flush thread, enhancing performance and concurrency.
…obustness (apache#58883)

Debug & tooling: write/parse versioned PackedFileDebugInfoPB trailer,
add packed_file_tool and unit tests for easier debugging
Recycler retries: make packed file ref-count updates
retryable/configurable to handle TXN_CONFLICTs
Prevent leak on recycle timing: ensure rowsets with packed slices are
processed even when the tablet is already marked recycled
Trim logs: downgrade noisy state transition logs to VLOG_DEBUG while
keeping upload completion details
…mall files (apache#59011)

Related PR: apache#57770 

Problem Summary:

When merging small files with inverted indexes, the segment close
operation was synchronously waiting for inverted index files to be
uploaded to S3. This blocking behavior significantly impacted the
memtable flush thread performance, causing bottlenecks in the data
loading pipeline.

Solution:

The solution introduces a two-phase close mechanism for inverted index
file writers:

1. **Asynchronous Close Phase**: During segment close, inverted index
files are closed asynchronously and the S3 upload task is submitted
immediately without waiting for completion.

2. **Wait Phase**: When the load channel closes, the system waits for
all pending S3 upload tasks to complete, ensuring data consistency.
@liaoxin01 liaoxin01 force-pushed the cherry-pick-packed-file-prs branch from ebeb626 to 4e33c27 Compare January 9, 2026 13:41
@liaoxin01
Copy link
Contributor Author

run buildall

@hello-stephen
Copy link
Contributor

Cloud UT Coverage Report

Increment line coverage 40.99% (544/1327) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 79.57% (1776/2232)
Line Coverage 64.89% (31461/48482)
Region Coverage 65.42% (15651/23923)
Branch Coverage 56.01% (8313/14842)

@hello-stephen
Copy link
Contributor

BE UT Coverage Report

Increment line coverage 74.58% (1121/1503) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 53.41% (18783/35165)
Line Coverage 39.19% (174067/444185)
Region Coverage 33.92% (134852/397584)
Branch Coverage 34.85% (58234/167095)

@liaoxin01 liaoxin01 marked this pull request as ready for review January 10, 2026 02:05
@yiguolei yiguolei merged commit 5ca01b6 into apache:branch-4.0 Jan 12, 2026
23 of 26 checks passed
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added approved Indicates a PR has been approved by one committer. reviewed labels Jan 12, 2026
@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants