-
Notifications
You must be signed in to change notification settings - Fork 4.4k
Description
M.E.VectorData currently has a rudimentary metadata filtering mechanism: the VectorSearchOptions passed to the vector search method can contains a VectorSearchFilter, which can contain a number of Equals or AnyTagEqualsTo clauses in an AND relationship only. Vector database filtering syntax typically goes beyond this, both for logical operators (OR, NOT...) and other operators (e.g. greater than, less than...).
Rather than continuing to develop our own expression tree and adding node types to address the richness of all vector databases, we could leverage the existing LINQ expression tree nodes in .NET. Aside from removing the problem of expression trees from the scope of MEVD, this would greatly improve the API usability, as users would be able to use C# to express their filter:
// Current:
var searchResult = await collection.VectorizedSearchAsync(
searchVector,
new()
{
Filter = new VectorSearchFilter().Equalto(nameof(Glossary.Category), "AI")
}
)
// Proposed:
var searchResult = await collection.VectorizedSearchAsync(
searchVector,
new()
{
Filter = g => g.Category == "AI"
}
)Notes:
- The main downside here from the user perspective is the limited actual filtering support in vector databases.
- The above proposal would allow expressing any C# code within the filter, but actually supported expressions will only be a small subset of all expressible things. Thus, beginners will likely try to write some complex condition, only to get a runtime exception saying that the filter isn't supported.
- In contrast, with a custom expression tree, we control which nodes are available, and the user simply cannot express anything beyond what we support. However, as we'd need to cover all vector databases, nodes needed for some databases wouldn't be supported by others, again leading to a runtime failure. So the general problem can't be avoided here.
- Overall, I don't believe the above will be a big problem - users will likely quickly get used to what's actually supported by their database (with proper documentation), and at that point this becomes a non-issue.
- Compiler-generated LINQ expression trees contain some kinks, and normailzation can be beneficial (e.g. users can both use the equality operator and the .NET Equals method - the latter can be normalized to the former). We may want to have some support component in the abstraction to preprocess the expression tree before handing it off to the provider. This could be a bit problematic as the abstraction currently consists of interfaces rather than base classes.
- Since the filter lambda needs to be generically typed based on the metadata record type of the collection, VectorSearchOptions would have to become generic over TRecord.
- Another advantage of using LINQ, is that queries are expressed over the user's data model (e.g. POCOs) rather than over the storage model; this is how user interacts with MEAI in all other APIs (e.g. when inserting, accessing metadata returning from search). But this also creates mapping difficulties (see next point).
- This needs to be kept in mind in relation to the layering of the ORM mapping feature (i.e. the ability to use arbitrary user POCOs) - we may end up in a place where it's not possible to pass the strongly-typed expression tree directly to the provider.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status