Example in-memory Vector index using the existing index APIs. #2661

nicktobey · 2024-09-16T21:09:08Z

This expands the index interfaces to make it possible to have vector indexes, and demonstrates it with a proof-of-concept in-memory index. It's a rough implementation with some shortcomings: for instance, it doesn't currently handle tables that consist of multiple partitions.

However, this showcases how to use the GMS interfaces to add a vector index.

… row execution engine. This allows them to be used by `memory` for constructing row iterators for indexes.

… signify when they can be used to optimize particular ORDER BY expressions.

…s on a vector index.

max-hoffman

Seems good as a first pass. Most only concern is the analyzer replacement rule might be noticeably expensive.

max-hoffman · 2024-09-17T21:11:49Z

sql/vector_range_collection.go

+
+import "fmt"
+
+type OrderAndLimit struct {


file name and contents seem a bit at odds. and you'll get license formatting complaints

Added this class to sql/core.go

max-hoffman · 2024-09-17T21:22:11Z

sql/analyzer/replace_order_by_distance.go

+)
+
+// replaceIdxSort applies an IndexAccess when there is an `OrderBy` over a prefix of any columns with Indexes
+func replaceIdxOrderByDistance(ctx *sql.Context, a *Analyzer, n sql.Node, scope *plan.Scope, sel RuleSelector, qFlags *sql.QueryFlags) (sql.Node, transform.TreeIdentity, error) {


looks fine as a first pass, but i imagine this might impact perf. A lot of expensive subcalls made for queries with ORDER BY. My first thought would be to maybe layer this into costedIndexScans, potentially link the LIMIT clause count in sql.QueryFlags, somehow consolidate Sort dropping with the other places we do that

This pass was modeled after replaceIdxSort. If this is expensive for ORDER BY queries, then so is that one.

I like the idea of layering both of these into costedIndexScans, but that should probably be a separate PR.

I'll keep an eye on perf.

jycor · 2024-09-18T18:18:30Z

sql/expression/distance.go

@@ -0,0 +1,110 @@
+package expression


Missing copyright header

jycor · 2024-09-18T18:19:05Z

sql/expression/distance.go

+	return "VEC_DISTANCE_L2_SQUARED"
+}
+
+func (d DistanceL2Squared) Eval(left []float64, right []float64) (float64, error) {


Maybe add some unit tests for sanity distance_test.go

Added unit test for sanity.

jycor · 2024-09-18T18:19:37Z

sql/analyzer/vector_index_test.go

@@ -0,0 +1,214 @@
+// Copyright 2023 Dolthub, Inc.


nicktobey force-pushed the nicktobey/vector branch from f157b0e to 5767ebf Compare September 17, 2024 17:54

nicktobey added 3 commits September 17, 2024 13:39

Create sql/iters package for row iterators that don't depend on the…

91a03bd

… row execution engine. This allows them to be used by `memory` for constructing row iterators for indexes.

Add CanSupportOrderBy to sql.Index interface, allowing indexes to…

fa7e988

… signify when they can be used to optimize particular ORDER BY expressions.

Implement vector indexes for in-memory tables.

a7f2bc8

nicktobey force-pushed the nicktobey/vector branch from 63355b5 to 8d92ae5 Compare September 17, 2024 20:51

nicktobey added 2 commits September 17, 2024 14:06

Add rule to replace ORDER BY VEC_DISTANCE with an indexed table acces…

53acbc3

…s on a vector index.

Add tests for vector indexes.

5d017b2

nicktobey force-pushed the nicktobey/vector branch from a86d38b to 5d017b2 Compare September 17, 2024 21:06

[ga-format-pr] Run ./format_repo.sh to fix formatting

3c20021

max-hoffman approved these changes Sep 17, 2024

View reviewed changes

jycor approved these changes Sep 18, 2024

View reviewed changes

nicktobey and others added 3 commits September 25, 2024 14:34

Respond to PR feedback.

60e7335

Merge remote-tracking branch 'origin/main' into nicktobey/vector

7b92be7

[ga-format-pr] Run ./format_repo.sh to fix formatting

af74afc

nicktobey merged commit 9555784 into main Sep 25, 2024
7 of 8 checks passed

nicktobey deleted the nicktobey/vector branch September 25, 2024 23:09

BrewTestBot mentioned this pull request Sep 27, 2024

dolt 1.43.1 Homebrew/homebrew-core#192054

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Example in-memory Vector index using the existing index APIs. #2661

Example in-memory Vector index using the existing index APIs. #2661

nicktobey commented Sep 16, 2024 •

edited

Loading

max-hoffman left a comment

max-hoffman Sep 17, 2024

nicktobey Sep 25, 2024

max-hoffman Sep 17, 2024

nicktobey Sep 25, 2024

jycor Sep 18, 2024

jycor Sep 18, 2024

nicktobey Sep 25, 2024

jycor Sep 18, 2024

nicktobey Sep 25, 2024

Example in-memory Vector index using the existing index APIs. #2661

Example in-memory Vector index using the existing index APIs. #2661

Conversation

nicktobey commented Sep 16, 2024 • edited Loading

max-hoffman left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nicktobey commented Sep 16, 2024 •

edited

Loading