-
Notifications
You must be signed in to change notification settings - Fork 3.7k
branch-4.0: [feat](cloud) Cherry pick packed file prs #59693
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
branch-4.0: [feat](cloud) Cherry pick packed file prs #59693
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR cherry-picks packed file-related pull requests to branch-4.0, introducing a comprehensive packed file management system for cloud mode in Apache Doris. The changes add support for merging small files into larger "packed files" to optimize storage efficiency in cloud environments.
Key changes:
- New packed file metadata structures and storage in protobuf definitions
- Recycler logic to manage packed file lifecycle (correction, deletion, reference counting)
- Checker logic to validate packed file integrity
- Meta service RPC handler for updating packed file information
- Extensive test coverage for packed file operations
- Configuration parameters for tuning packed file behavior
Reviewed changes
Copilot reviewed 93 out of 93 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| gensrc/proto/olap_file.proto | Added PackedSliceLocationPB message and packed_slice_locations map to rowset metadata |
| gensrc/proto/cloud.proto | Added PackedSlicePB, PackedFileInfoPB, PackedFileFooterPB messages and update_packed_file_info RPC |
| cloud/src/recycler/recycler.{h,cpp} | Core packed file recycling logic including ref count management and deletion |
| cloud/src/recycler/checker.{h,cpp} | Packed file integrity checker to detect leaks and inconsistencies |
| cloud/src/meta-service/meta_service.{h,cpp} | RPC handler for updating packed file metadata |
| cloud/src/meta-store/keys.{h,cpp} | Key encoding/decoding for packed file metadata storage |
| cloud/src/common/{config.h,bvars.{h,cpp}} | Configuration parameters and metrics for packed files |
| cloud/test/recycler_test.cpp | Comprehensive unit tests for packed file recycling |
| cloud/test/meta_service_test.cpp | Unit tests for update_packed_file_info RPC |
| regression-test/suites/load_p0/stream_load/test_packed_file_stream_load_case*.groovy | Integration tests for packed file with stream load |
| regression-test/suites/cloud_p0/packed_file/*.groovy | Integration tests for various packed file scenarios |
| be/test/* | Updated test files to use new index file writer API (begin_close/finish_close) |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…erhead (apache#57770) Co-authored-by: Luwei <luwei@selectdb.com> Co-authored-by: Luwei <814383175@qq.com>
…files (apache#58839) Related PR: apache#57770 Problem Summary: The main change is to offload the entire delete bitmap calculation process—including file closing, rowset building, segment loading, and bitmap calculation—to a background thread pool. This prevents blocking the memtable flush thread, enhancing performance and concurrency.
…obustness (apache#58883) Debug & tooling: write/parse versioned PackedFileDebugInfoPB trailer, add packed_file_tool and unit tests for easier debugging Recycler retries: make packed file ref-count updates retryable/configurable to handle TXN_CONFLICTs Prevent leak on recycle timing: ensure rowsets with packed slices are processed even when the tablet is already marked recycled Trim logs: downgrade noisy state transition logs to VLOG_DEBUG while keeping upload completion details
…mall files (apache#59011) Related PR: apache#57770 Problem Summary: When merging small files with inverted indexes, the segment close operation was synchronously waiting for inverted index files to be uploaded to S3. This blocking behavior significantly impacted the memtable flush thread performance, causing bottlenecks in the data loading pipeline. Solution: The solution introduces a two-phase close mechanism for inverted index file writers: 1. **Asynchronous Close Phase**: During segment close, inverted index files are closed asynchronously and the S3 upload task is submitted immediately without waiting for completion. 2. **Wait Phase**: When the load channel closes, the system waits for all pending S3 upload tasks to complete, ensuring data consistency.
ebeb626 to
4e33c27
Compare
|
run buildall |
Cloud UT Coverage ReportIncrement line coverage Increment coverage report
|
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
pick: https://github.com/apache/doris/issues?q=state%3Aclosed%20label%3Apacked-file