-
-
Notifications
You must be signed in to change notification settings - Fork 211
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Example in-memory Vector index using the existing index APIs. #2661
Conversation
f157b0e
to
5767ebf
Compare
… row execution engine. This allows them to be used by `memory` for constructing row iterators for indexes.
… signify when they can be used to optimize particular ORDER BY expressions.
63355b5
to
8d92ae5
Compare
a86d38b
to
5d017b2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems good as a first pass. Most only concern is the analyzer replacement rule might be noticeably expensive.
sql/vector_range_collection.go
Outdated
|
||
import "fmt" | ||
|
||
type OrderAndLimit struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
file name and contents seem a bit at odds. and you'll get license formatting complaints
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added this class to sql/core.go
) | ||
|
||
// replaceIdxSort applies an IndexAccess when there is an `OrderBy` over a prefix of any columns with Indexes | ||
func replaceIdxOrderByDistance(ctx *sql.Context, a *Analyzer, n sql.Node, scope *plan.Scope, sel RuleSelector, qFlags *sql.QueryFlags) (sql.Node, transform.TreeIdentity, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks fine as a first pass, but i imagine this might impact perf. A lot of expensive subcalls made for queries with ORDER BY. My first thought would be to maybe layer this into costedIndexScans
, potentially link the LIMIT clause count in sql.QueryFlags
, somehow consolidate Sort dropping with the other places we do that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This pass was modeled after replaceIdxSort
. If this is expensive for ORDER BY queries, then so is that one.
I like the idea of layering both of these into costedIndexScans
, but that should probably be a separate PR.
I'll keep an eye on perf.
@@ -0,0 +1,110 @@ | |||
package expression |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing copyright header
return "VEC_DISTANCE_L2_SQUARED" | ||
} | ||
|
||
func (d DistanceL2Squared) Eval(left []float64, right []float64) (float64, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add some unit tests for sanity distance_test.go
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added unit test for sanity.
sql/analyzer/vector_index_test.go
Outdated
@@ -0,0 +1,214 @@ | |||
// Copyright 2023 Dolthub, Inc. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
This expands the index interfaces to make it possible to have vector indexes, and demonstrates it with a proof-of-concept in-memory index. It's a rough implementation with some shortcomings: for instance, it doesn't currently handle tables that consist of multiple partitions.
However, this showcases how to use the GMS interfaces to add a vector index.