-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[fix](index compaction)Skip writing terms with a doc frequency of 0 #43113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[fix](index compaction)Skip writing terms with a doc frequency of 0 #43113
Conversation
|
Thank you for your contribution to Apache Doris. Since 2024-03-18, the Document has been moved to doris-website. |
|
run buildall |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
be/test/olap/rowset/segment_v2/inverted_index/compaction/index_compaction_with_deleted_term.cpp
Show resolved
Hide resolved
be/test/olap/rowset/segment_v2/inverted_index/compaction/index_compaction_with_deleted_term.cpp
Show resolved
Hide resolved
be/test/olap/rowset/segment_v2/inverted_index/compaction/index_compaction_with_deleted_term.cpp
Show resolved
Hide resolved
| std::string _curreent_dir; | ||
| }; | ||
|
|
||
| TEST_F(IndexCompactionDeleteTest, delete_index_test) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: function 'TEST_F' exceeds recommended size/complexity thresholds [readability-function-size]
TEST_F(IndexCompactionDeleteTest, delete_index_test) {
^Additional context
be/test/olap/rowset/segment_v2/inverted_index/compaction/index_compaction_with_deleted_term.cpp:560: 107 lines including whitespace and comments (threshold 80)
TEST_F(IndexCompactionDeleteTest, delete_index_test) {
^8715517 to
9456193
Compare
|
run buildall |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
be/test/olap/rowset/segment_v2/inverted_index/compaction/index_compaction_with_deleted_term.cpp
Show resolved
Hide resolved
| std::string _curreent_dir; | ||
| }; | ||
|
|
||
| TEST_F(IndexCompactionDeleteTest, delete_index_test) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: function 'TEST_F' exceeds recommended size/complexity thresholds [readability-function-size]
TEST_F(IndexCompactionDeleteTest, delete_index_test) {
^Additional context
be/test/olap/rowset/segment_v2/inverted_index/compaction/index_compaction_with_deleted_term.cpp:561: 107 lines including whitespace and comments (threshold 80)
TEST_F(IndexCompactionDeleteTest, delete_index_test) {
^
TPC-H: Total hot run time: 41525 ms |
TPC-DS: Total hot run time: 197939 ms |
ClickBench: Total hot run time: 32.73 s |
|
TeamCity be ut coverage result: |
airborne12
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
eldenmoon
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
What problem does this PR solve?
When the term was deleted in its entirety, we incorrectly recorded information about the term with a doc frequency of zero. This results in redundant information in the index file. If many terms were deleted, the index file would be much larger than normal.
In this pr, we have removed the information for term with doc frequency 0.
Problem Summary:
Check List (For Committer)
Test
Behavior changed:
Does this need documentation?
Release note
bugfix: Skip writing terms with a doc frequency of 0
Check List (For Reviewer who merge this PR)