-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[opt](cloud) optimize load performance for inverted index when pack small files #59011
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR optimizes the load performance for inverted indexes when packing small files by introducing non-blocking file writer close operations and adding explicit file writer cleanup logic.
Key Changes
- Changed file writer close operation to non-blocking mode (
close(true)) in FSIndexOutputV2 to improve performance - Added explicit cleanup loop in SegmentFlusher::close() to ensure all index file writers' underlying file writers are properly closed
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| be/src/olap/rowset/segment_v2/inverted_index_fs_directory.cpp | Modified FSIndexOutputV2::close() to use non-blocking close (close(true)) for the underlying file writer to improve performance |
| be/src/olap/rowset/segment_creator.cpp | Added explicit loop to close underlying file writers after closing the index file collection, ensuring proper resource cleanup even if errors occur during the close chain |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
TPC-H: Total hot run time: 36035 ms |
TPC-DS: Total hot run time: 178491 ms |
ClickBench: Total hot run time: 27.31 s |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
1edde83 to
2e5391a
Compare
|
run buildall |
TPC-H: Total hot run time: 36456 ms |
TPC-DS: Total hot run time: 178672 ms |
ClickBench: Total hot run time: 27.3 s |
|
run buildall |
TPC-H: Total hot run time: 34973 ms |
TPC-DS: Total hot run time: 177439 ms |
ClickBench: Total hot run time: 27.13 s |
|
run buildall |
TPC-H: Total hot run time: 35195 ms |
TPC-DS: Total hot run time: 178442 ms |
ClickBench: Total hot run time: 27.95 s |
2e5391a to
05424bc
Compare
|
run buildall |
TPC-H: Total hot run time: 36514 ms |
TPC-DS: Total hot run time: 178401 ms |
ClickBench: Total hot run time: 27.32 s |
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
|
run buildall |
fa17d90 to
2023b08
Compare
|
run buildall |
TPC-H: Total hot run time: 36545 ms |
TPC-DS: Total hot run time: 178672 ms |
ClickBench: Total hot run time: 27.11 s |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
|
run buildall |
airborne12
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
TPC-H: Total hot run time: 35195 ms |
TPC-DS: Total hot run time: 179329 ms |
ClickBench: Total hot run time: 27.45 s |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
…mall files (apache#59011) Related PR: apache#57770 Problem Summary: When merging small files with inverted indexes, the segment close operation was synchronously waiting for inverted index files to be uploaded to S3. This blocking behavior significantly impacted the memtable flush thread performance, causing bottlenecks in the data loading pipeline. Solution: The solution introduces a two-phase close mechanism for inverted index file writers: 1. **Asynchronous Close Phase**: During segment close, inverted index files are closed asynchronously and the S3 upload task is submitted immediately without waiting for completion. 2. **Wait Phase**: When the load channel closes, the system waits for all pending S3 upload tasks to complete, ensuring data consistency.
…mall files (apache#59011) Related PR: apache#57770 Problem Summary: When merging small files with inverted indexes, the segment close operation was synchronously waiting for inverted index files to be uploaded to S3. This blocking behavior significantly impacted the memtable flush thread performance, causing bottlenecks in the data loading pipeline. Solution: The solution introduces a two-phase close mechanism for inverted index file writers: 1. **Asynchronous Close Phase**: During segment close, inverted index files are closed asynchronously and the S3 upload task is submitted immediately without waiting for completion. 2. **Wait Phase**: When the load channel closes, the system waits for all pending S3 upload tasks to complete, ensuring data consistency.
What problem does this PR solve?
Issue Number: close #xxx
Related PR: #57770
Problem Summary:
When merging small files with inverted indexes, the segment close operation was synchronously waiting for inverted index files to be uploaded to S3. This blocking behavior significantly impacted the memtable flush thread performance, causing bottlenecks in the data loading pipeline.
Solution:
The solution introduces a two-phase close mechanism for inverted index file writers:
Asynchronous Close Phase: During segment close, inverted index files are closed asynchronously and the S3 upload task is submitted immediately without waiting for completion.
Wait Phase: When the load channel closes, the system waits for all pending S3 upload tasks to complete, ensuring data consistency.
Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)