Skip to content

Conversation

@zhiqiang-hhhh
Copy link
Contributor

@zhiqiang-hhhh zhiqiang-hhhh commented Aug 4, 2025

What problem does this PR solve?

Introducing Ann index to doris.

This pull request introduces foundational support for ANN (Approximate Nearest Neighbor) vector index functionality in the storage engine, including new runtime structures, configuration options, and initial integration with the build system. The changes lay the groundwork for ANN-based search and statistics collection, and begin integrating ANN index support into various storage and query execution paths.

The implementation of ann index is based on faiss.
Faiss could return distance directly, so this pr using virtual slot ref to return result from index.

Each data segment of doris will have a faiss index if user creates a table with Ann index, and new segment generated by compaction will have a faiss index automatically.

Currently, create index and build index is not supported, index defination be added to ddl if you want it.

ANN Index Feature Integration:

  • Added new runtime structures and parameters for ANN index operations, including AnnIndexStats, AnnIndexParam, RangeSearchParams, RangeSearchResult, and others in ann_search_params.h, as well as RangeSearchRuntimeInfo for managing ANN range search context. [1] [2] [3]
  • Extended StorageReadOptions and RowsetReaderContext to include ann_topn_runtime for passing ANN runtime information through the storage read path. [1] [2] [3]
  • Added new ANN-related statistics fields (timing and row counts) to OlapReaderStatistics for monitoring ANN index operations.

Build System and Dependency Updates:

  • Added doris-faiss and doris-openblas as submodules for ANN/vector index support, and integrated the new Vector library into the build process and as a dependency for relevant targets. [1] [2] [3] [4]

Index Handling and Schema Integration:

  • Updated index file writer accessors and naming from "inverted_index" to more generic "index" to accommodate ANN and other index types. [1] [2]
  • Changed index creation logic in SegmentFlusher to use has_extra_index() (supporting both inverted and ANN indexes) instead of has_inverted_index(). [1] [2] [3] [4]

Configuration:

  • Introduced a new configuration option opm_threads_limit to control the maximum number of OpenMP threads used per Doris thread, which is relevant for vectorized/ANN computation. [1] [2]

These changes set up the infrastructure required for future development of ANN vector index features, including search, filtering, and statistics collection.

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@zhiqiang-hhhh
Copy link
Contributor Author

run buildall

@zhiqiang-hhhh
Copy link
Contributor Author

run buildall

@zhiqiang-hhhh
Copy link
Contributor Author

run buildall

@zhiqiang-hhhh
Copy link
Contributor Author

run buildall

@zhiqiang-hhhh
Copy link
Contributor Author

run buildall

@hello-stephen
Copy link
Contributor

Cloud UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 81.60% (1401/1717)
Line Coverage 66.01% (23867/36158)
Region Coverage 67.22% (11840/17615)
Branch Coverage 56.85% (6198/10902)

@zhiqiang-hhhh zhiqiang-hhhh force-pushed the vec-rebase-vslot-ref branch 3 times, most recently from 11601d9 to 0f3a54c Compare August 7, 2025 07:23
@zhiqiang-hhhh
Copy link
Contributor Author

run buildall

@hello-stephen
Copy link
Contributor

Cloud UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 81.79% (1406/1719)
Line Coverage 66.15% (24011/36297)
Region Coverage 67.31% (11910/17693)
Branch Coverage 56.94% (6236/10952)

@hello-stephen
Copy link
Contributor

FE UT Coverage Report

Increment line coverage 21.95% (90/410) 🎉
Increment coverage report
Complete coverage report

@zhiqiang-hhhh zhiqiang-hhhh force-pushed the vec-rebase-vslot-ref branch 2 times, most recently from 46d76be to 346b9e5 Compare August 7, 2025 13:02
@zhiqiang-hhhh
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

Cloud UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 81.92% (1409/1720)
Line Coverage 66.05% (24066/36434)
Region Coverage 67.22% (11968/17805)
Branch Coverage 56.87% (6259/11006)

@hello-stephen
Copy link
Contributor

FE UT Coverage Report

Increment line coverage 19.29% (49/254) 🎉
Increment coverage report
Complete coverage report

@zhiqiang-hhhh
Copy link
Contributor Author

run buildall

@zhiqiang-hhhh zhiqiang-hhhh changed the title DRAFT TEST [feat] Ann Index Aug 8, 2025
@zhiqiang-hhhh zhiqiang-hhhh marked this pull request as ready for review August 8, 2025 06:53
@hello-stephen
Copy link
Contributor

FE UT Coverage Report

Increment line coverage 18.77% (49/261) 🎉
Increment coverage report
Complete coverage report

@hello-stephen
Copy link
Contributor

BE UT Coverage Report

Increment line coverage 68.02% (985/1448) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 51.71% (17110/33088)
Line Coverage 37.18% (155811/419098)
Region Coverage 31.88% (118833/372752)
Branch Coverage 33.19% (52243/157399)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 86.93% (1237/1423) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 70.69% (22971/32496)
Line Coverage 56.97% (238650/418893)
Region Coverage 52.49% (198500/378187)
Branch Coverage 54.14% (85747/158370)

@zhiqiang-hhhh
Copy link
Contributor Author

run p0

@zhiqiang-hhhh
Copy link
Contributor Author

run feut

@hello-stephen
Copy link
Contributor

FE UT Coverage Report

Increment line coverage 18.77% (49/261) 🎉
Increment coverage report
Complete coverage report

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 86.93% (1237/1423) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 70.69% (22970/32496)
Line Coverage 56.97% (238655/418893)
Region Coverage 52.49% (198514/378187)
Branch Coverage 54.15% (85751/158370)

Copy link
Contributor

@HappenLee HappenLee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Aug 21, 2025
@morrySnow morrySnow changed the title [feat] Ann Index [feat](index) Ann Index Aug 21, 2025
@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

Copy link
Member

@airborne12 airborne12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@airborne12 airborne12 merged commit 4bc6b14 into apache:master Aug 22, 2025
30 of 33 checks passed
@zhiqiang-hhhh zhiqiang-hhhh deleted the vec-rebase-vslot-ref branch August 22, 2025 06:55
HappenLee pushed a commit that referenced this pull request Aug 29, 2025
…55184)

This pull request standardizes the return type of all vector distance
functions to float across the codebase, ensuring consistency and
improving performance for vector similarity search operations.

Related PR: #54276
airborne12 pushed a commit that referenced this pull request Sep 16, 2025
…55458)

### What problem does this PR solve?

Support expression like this to be pushed down to ann index

```sql
select id from ann_cast_rhs_ip order by inner_product_approximate(embedding, cast('[0.1,0.2,0.3,0.4]' as array<float>)) desc limit 3;
```

Related PR: #54276
airborne12 pushed a commit that referenced this pull request Nov 20, 2025
### What problem does this PR solve?

This PR add `CREATE INDEX` and `BUILD INDEX` sql syntax for ANN index 

i.e.

```sql
CREATE INDEX [IF NOT EXISTS] <ann_index_name> 
             ON <table_name> (<column_name>)
             USING ANN
             [PROPERTIES ("<key>" = "<value>"[ , ...])]
             [COMMENT '<index_comment>']

BUILD INDEX <ann_index_name> ON <table_name> [partition_list]
```


Related PR: #54276
airborne12 pushed a commit to airborne12/apache-doris that referenced this pull request Dec 3, 2025
…#55586)

This PR add `CREATE INDEX` and `BUILD INDEX` sql syntax for ANN index

i.e.

```sql
CREATE INDEX [IF NOT EXISTS] <ann_index_name>
             ON <table_name> (<column_name>)
             USING ANN
             [PROPERTIES ("<key>" = "<value>"[ , ...])]
             [COMMENT '<index_comment>']

BUILD INDEX <ann_index_name> ON <table_name> [partition_list]
```

Related PR: apache#54276
nagisa-kunhah pushed a commit to nagisa-kunhah/doris that referenced this pull request Dec 14, 2025
…#55586)

### What problem does this PR solve?

This PR add `CREATE INDEX` and `BUILD INDEX` sql syntax for ANN index 

i.e.

```sql
CREATE INDEX [IF NOT EXISTS] <ann_index_name> 
             ON <table_name> (<column_name>)
             USING ANN
             [PROPERTIES ("<key>" = "<value>"[ , ...])]
             [COMMENT '<index_comment>']

BUILD INDEX <ann_index_name> ON <table_name> [partition_list]
```


Related PR: apache#54276
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants