Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

👑 Global vector index #8967

Open
12 of 17 tasks
MBkkt opened this issue Sep 9, 2024 · 0 comments
Open
12 of 17 tasks

👑 Global vector index #8967

MBkkt opened this issue Sep 9, 2024 · 0 comments
Assignees
Labels
area/datashard Issues related to datashard tablets (relational table partitions) epic

Comments

@MBkkt
Copy link
Collaborator

MBkkt commented Sep 9, 2024

Description

Implement a global vector index for DataShards to speedup such and similar requests for row-oriented tables:

SELECT * FROM t ORDER BY SomeDistance(t.embedding, target)

Steps

1. Design vector index

Overall internal design doc

2. Prototype vector index in C++ SDK

3. Implement build (construction) of vector index

Internal design doc

SchemeShard

DataShard scans

SchemeShard requests coordination

Leftovers

4. Implement search vector index

5. Implement DML vector index

TODO kqp/datashard

Some plans

  • Make partition for final posting table before filling it

Possible improvements

  • Compact postings
  • Use filter for primary keys and covered columns
@MBkkt MBkkt added the area/datashard Issues related to datashard tablets (relational table partitions) label Sep 9, 2024
@MBkkt MBkkt added this to 👑 Epics Sep 9, 2024
@MBkkt MBkkt moved this to In Progress in 👑 Epics Sep 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/datashard Issues related to datashard tablets (relational table partitions) epic
Projects
Status: In Progress
Development

No branches or pull requests

2 participants