You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for this library. I'm just playing around with it to see if it can fit in as a replacement for the myriad user-defined sql functions we're currently using to perform knn search on binary features and have a question regarding the use of binary hashes in place of floating-point features/embeddings.
So far as I've been able to tell, FAISS supports IndexBinaryFlat with the string BFlat and with various B-prefixed versions of the index strings for use in the factory constructor, but it's a completely separate base class from the regular index factory. Indeed, trying to use the following:
CREATE VIRTUAL TABLE IF NOT EXISTS "vss_files" using vss0 (
embedding(144) factory="BFlat,IDMap2",
);
throws an exception:
Error building index factory for embedding: Error in std::unique_ptr<faiss::Index> faiss::{anonymous}::index_factory_sub(int, std::string, faiss::MetricType) at /home/runner/work/sqlite-vss/sqlite-vss/vendor/faiss/faiss/index_factory.cpp:877: could not parse index string BFlat
(IDMap2 is, as I understand it, implemented for IndexBinaryFlatsince 2019.)
The only approach I can think of to work around this issue would be to treat the binary hash as a densely packed bitwise representation of a one-hot-encoded embedding and either insert a 1.0 or 0.0 float for each bit (so an n-byte binary vector turns into a n*8*2-byte fp16 embedding) and either insert that directly at a huge storage and compute premium, or take that and compress its features (ProductQuantizer?) into a smaller embedding increasing compute but reducing storage (and performance/accuracy).
Ideally, we would be able to use bfactory= instead of factory= to create a binary index or factory= would introspect its payload for BFlat and create a binary index instead of a regular one?
The text was updated successfully, but these errors were encountered:
Thanks for this library. I'm just playing around with it to see if it can fit in as a replacement for the myriad user-defined sql functions we're currently using to perform knn search on binary features and have a question regarding the use of binary hashes in place of floating-point features/embeddings.
So far as I've been able to tell, FAISS supports IndexBinaryFlat with the string
BFlat
and with variousB
-prefixed versions of the index strings for use in the factory constructor, but it's a completely separate base class from the regular index factory. Indeed, trying to use the following:throws an exception:
(
IDMap2
is, as I understand it, implemented forIndexBinaryFlat
since 2019.)The only approach I can think of to work around this issue would be to treat the binary hash as a densely packed bitwise representation of a one-hot-encoded embedding and either insert a
1.0
or0.0
float for each bit (so an n-byte binary vector turns into a n*8*2-byte fp16 embedding) and either insert that directly at a huge storage and compute premium, or take that and compress its features (ProductQuantizer?) into a smaller embedding increasing compute but reducing storage (and performance/accuracy).Ideally, we would be able to use
bfactory=
instead offactory=
to create a binary index orfactory=
would introspect its payload forBFlat
and create a binary index instead of a regular one?The text was updated successfully, but these errors were encountered: