Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(storage): improve inverted index match phrase query #16547

Merged
merged 4 commits into from
Sep 30, 2024

Conversation

b41sh
Copy link
Member

@b41sh b41sh commented Sep 29, 2024

I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/

Summary

  • reimplement inverted index phrase query matching algorithm, fix some rows matching failure case and panic case.
  • use RoaringTreemap instead of HashSet to improve the computing performance of bitand, bitor.
  • use ok_or_else instead of ok_or to improve performance.
  • remove some unused inverted index code.

The phrase query matches doc_ids as follows:

  1. Collect the position for each term in the query.
  2. Collect the doc_ids of each term and take the intersection to get the candidate doc_ids.
  3. Iterate over the candidate doc_ids to check whether the position of terms matches the position of terms in query.
  4. Each position in the first term is a possible query phrase beginning. Verify that the beginning is valid by checking whether corresponding positions in other terms exist. If not, delete the possible position in the first term. After traversing all terms, determine if there are any positions left in the first term. If there are, then the doc_id is matched.

Tests

  • Unit Test
  • Logic Test
  • Benchmark Test
  • No Test - Code refactoring, using previous test case to check

Type of change

  • Bug Fix (non-breaking change which fixes an issue)
  • New Feature (non-breaking change which adds functionality)
  • Breaking Change (fix or feature that could cause existing functionality not to work as expected)
  • Documentation Update
  • Refactoring
  • Performance Improvement
  • Other (please describe):

This change is Reviewable

@b41sh b41sh requested a review from sundy-li September 29, 2024 16:38
@github-actions github-actions bot added the pr-refactor this PR changes the code base without new features or bugfix label Sep 29, 2024
@b41sh b41sh added this pull request to the merge queue Sep 30, 2024
Merged via the queue into databendlabs:main with commit abd8266 Sep 30, 2024
71 checks passed
@b41sh b41sh deleted the refactor-inverted-index-2 branch September 30, 2024 04:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-refactor this PR changes the code base without new features or bugfix
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants