Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Example in-memory Vector index using the existing index APIs. #2661

Merged
merged 9 commits into from
Sep 25, 2024

Conversation

nicktobey
Copy link
Contributor

@nicktobey nicktobey commented Sep 16, 2024

This expands the index interfaces to make it possible to have vector indexes, and demonstrates it with a proof-of-concept in-memory index. It's a rough implementation with some shortcomings: for instance, it doesn't currently handle tables that consist of multiple partitions.

However, this showcases how to use the GMS interfaces to add a vector index.

… row execution engine. This allows them to be used by `memory` for constructing row iterators for indexes.
… signify when they can be used to optimize particular ORDER BY expressions.
Copy link
Contributor

@max-hoffman max-hoffman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems good as a first pass. Most only concern is the analyzer replacement rule might be noticeably expensive.


import "fmt"

type OrderAndLimit struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

file name and contents seem a bit at odds. and you'll get license formatting complaints

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added this class to sql/core.go

)

// replaceIdxSort applies an IndexAccess when there is an `OrderBy` over a prefix of any columns with Indexes
func replaceIdxOrderByDistance(ctx *sql.Context, a *Analyzer, n sql.Node, scope *plan.Scope, sel RuleSelector, qFlags *sql.QueryFlags) (sql.Node, transform.TreeIdentity, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks fine as a first pass, but i imagine this might impact perf. A lot of expensive subcalls made for queries with ORDER BY. My first thought would be to maybe layer this into costedIndexScans, potentially link the LIMIT clause count in sql.QueryFlags, somehow consolidate Sort dropping with the other places we do that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pass was modeled after replaceIdxSort. If this is expensive for ORDER BY queries, then so is that one.

I like the idea of layering both of these into costedIndexScans, but that should probably be a separate PR.

I'll keep an eye on perf.

@@ -0,0 +1,110 @@
package expression
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing copyright header

return "VEC_DISTANCE_L2_SQUARED"
}

func (d DistanceL2Squared) Eval(left []float64, right []float64) (float64, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add some unit tests for sanity distance_test.go

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added unit test for sanity.

@@ -0,0 +1,214 @@
// Copyright 2023 Dolthub, Inc.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2024

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@nicktobey nicktobey merged commit 9555784 into main Sep 25, 2024
7 of 8 checks passed
@nicktobey nicktobey deleted the nicktobey/vector branch September 25, 2024 23:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants