-
-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segfault when inserting embeddings into vss_table #18
Comments
Thanks for filing @MahmoudFawzyKhalil ! I'm having trouble reproducing: Can you try running this python script and see if you find the same issue? import sqlite3
import sqlite_vss
import numpy as np
db = sqlite3.connect(":memory:")
db.enable_load_extension(True)
sqlite_vss.load(db)
print(db.execute("select vss_version()").fetchone()[0])
db.executescript("""
CREATE TABLE IF NOT EXISTS resources (
id INTEGER PRIMARY KEY,
url TEXT,
title TEXT
);
CREATE TABLE IF NOT EXISTS chunks (
id INTEGER PRIMARY KEY,
chunk TEXT,
embedding BLOB,
resource_id INTEGER,
FOREIGN KEY (resource_id) REFERENCES resources (id)
);
CREATE VIRTUAL TABLE vss_chunks USING vss0(
chunk_embedding(768)
);
INSERT INTO resources (url, title)
VALUES ("foo", "bar");
""")
db.execute("""
INSERT INTO chunks (chunk, embedding, resource_id)
VALUES ("foo", ?1, 1);
""", [np.zeros((1, 768), dtype=np.float32)])
db.execute("""
INSERT INTO vss_chunks (rowid, chunk_embedding)
SELECT rowid, embedding
FROM chunks;
""")
db.commit()
results = db.execute("select rowid, vector_debug(chunk_embedding), * from vss_chunks").fetchall()
print(results)
db.close() If that works, then in your original database, can you run: SELECT DISTINCT length(embedding)
FROM chunks And see what it returns? I have a feeling that some of the embeddings lengths are not 3072 (768*4), which might be causing the segfault |
Thank you for the quick support!
Also: in the example you sent me if I set the vector length when creating the virtual table to something like 10, it just truncates the rest, it does not segfault. |
I managed to reproduce the issue in the script you sent:
Using just a single connection in my code solved the issue. Also, re-running the script on the same database is fine even though it opens a new connection, as long as it is the only one opened in that run. import sqlite3
from typing import List, Any
import sqlite_vss
import numpy as np
db = sqlite3.connect("bla.db")
db.enable_load_extension(True)
sqlite_vss.load(db)
print(db.execute("select vss_version()").fetchone()[0])
db.executescript("""
CREATE TABLE IF NOT EXISTS resources (
id INTEGER PRIMARY KEY,
url TEXT,
title TEXT
);
CREATE TABLE IF NOT EXISTS chunks (
id INTEGER PRIMARY KEY,
chunk TEXT,
embedding BLOB,
resource_id INTEGER,
FOREIGN KEY (resource_id) REFERENCES resources (id)
);
CREATE VIRTUAL TABLE vss_chunks USING vss0(
chunk_embedding(10)
);
INSERT INTO resources (url, title)
VALUES ("foo", "bar");
""")
db.execute("""
INSERT INTO chunks (chunk, embedding, resource_id)
VALUES ("foo", ?, 1);
""", [np.zeros((1, 768), dtype=np.float32)])
db.commit()
print(db.execute("SELECT * FROM chunks").fetchall())
# Close connection and create a new one
db.close()
db = sqlite3.connect("bla.db")
db.enable_load_extension(True)
sqlite_vss.load(db)
db.execute("""
INSERT INTO vss_chunks (rowid, chunk_embedding)
SELECT rowid, embedding
FROM chunks;
""")
db.commit()
results = db.execute("select rowid, vector_debug(chunk_embedding), * from vss_chunks").fetchall()
print(results)
db.close() |
Thank you @MahmoudFawzyKhalil ! I can now reproduce, attempting a fix |
Smallest possible repro: .open tmp.db
.load dist/debug/vector0
.load dist/debug/vss0
CREATE VIRTUAL TABLE vss_chunks USING vss0(
chunk_embedding(1)
);
.open tmp.db
.load dist/debug/vector0
.load dist/debug/vss0
INSERT INTO vss_chunks (rowid, chunk_embedding)
SELECT 2, json_array(1); If you create a vss0 table, don't insert any data, close the connection, open a new connection, and then try to insert into the table, it segfault. This is because we only serialize the Faiss index after write transactions are commited. But if you don't insert into the table when it's first created, there's no write transaction commit, so the index never gets written. Will add a new test case and a new version bump shortly. |
Thank you @asg017 |
Following for the fix as well. Was meaning to submit a bug report this week about the same behavior, noticed when I included the table setup as part of migrations in an app. Workaround (and what I did originally, which masked the issue) was to perform the table setup after seeding the database. Looking forward to being able to remove the temporary hack around I had in place :) |
This has now been fixed in Will close, but please file another issue if you find anything else! Thanks for the initial report. |
Confirmed working for my use case. Thanks! |
Summary:
When inserting data into the virtual table "vss_chunks" using SQLite, a segmentation fault occurs in the C++ code of the "vssIndexUpdate" function.
Steps to reproduce:
Logs from IntelliJ when attempting the same query using its SQL console:
Environment:
Operating system: Ubuntu 22.04.2 LTS
Python version: 3.10.6
SQLite version: 3.40.0
sqlite_vss version: 0.0.4 (installed with pip)
sentence_transformers: multi-qa-mpnet-base-cos-v1 model when generates 768 dimensional embeddings
Schema:
The text was updated successfully, but these errors were encountered: