-
Notifications
You must be signed in to change notification settings - Fork 82
Closed
Description
I have a table "messages" (model shown below) with a BIT(64)
column in PostgreSQL, and I'm trying to insert or update data into that column using SQLAlchemy (via pgvector-python
). However, I'm getting an error about the bit string length not matching BIT(64)
. I've tried all of the following formats for the data:
Bit(embedding)
(expects bytes-like object)"010101"
(expects bytes-like object)b"010101"
(bit string length 528 does not match type bit(64))b'B"010101"'
(bit string length 536 does not match type bit(64))
But no matter which approach I take, I see an error similar to:
ERROR:__main__:Error processing batch: (sqlalchemy.dialects.postgresql.asyncpg.Error) <class 'asyncpg.exceptions.StringDataLengthMismatchError'>: bit string l
ength 536 does not match type bit(64)
[SQL: UPDATE messages SET message_hash=$1, updated_at=now() WHERE messages.id = $2::BIGINT]
[parameters: [(b"B'0000100101000100000101001100001100011011010111000111101111111110'", 23189), (b"B'1000100001000100000101001100011100010011001110001101101111
011110'", 724950), (b"B'1011011100000100010001111110101111001011111011010110101111011010'", 103289), (b"B'1001101100010100010001100100001101011011001111011100
101111111010'", 90)]]
(Background on this error at: https://sqlalche.me/e/20/dbapi)
ERROR:__main__:Fatal error: (sqlalchemy.dialects.postgresql.asyncpg.Error) <class 'asyncpg.exceptions.StringDataLengthMismatchError'>: bit string length 536 d
oes not match type bit(64)
or
sqlalchemy.exc.DBAPIError: (sqlalchemy.dialects.postgresql.asyncpg.Error) <class 'asyncpg.exceptions.StringDataLengthMismatchError'>: bit string length 528 does not match type bit(64)
[SQL: UPDATE messages SET message_hash=$1, updated_at=now() WHERE messages.id = $2::BIGINT]
[parameters: [(b"'0000100101000100000101001100001100011011010111000111101111111110'", 23189), (b"'1000100001000100000101001100011100010011001110001101101111011110'", 724950), (b"'1011011100000100010001111110101111001011111011010110101111011010'", 103289), (b"'1001101100010100010001100100001101011011001111011100101111111010'", 90)]]
(Background on this error at: https://sqlalche.me/e/20/dbapi)
What I Tried
- Just giving a plain string of bits (e.g.,
"010101"
). - Supplying the value in a
B'...'
literal form. - Supplying raw bytes, like
b"010101"
. - Using a
Bit
object frompgvector.utils.Bit
.
Nothing seems to work, and I get the mismatch length error.
Snippet of the Model
from pgvector.sqlalchemy import BIT
from pgvector.utils import Bit
class Messages(Base):
__tablename__ = "messages"
id: Mapped[int] = mapped_column(BigInteger, primary_key=True)
# other columns...
message_hash: Mapped[np.ndarray | None] = mapped_column(BIT(64), nullable=True)
Code Snippet
assert embeddings_hash.ndim == 1, f"{embeddings_hash.shape=}"
assert embeddings_hash.shape[0] == len(batch), f"{embeddings_hash.shape=} {len(batch)=}"
assert embeddings_hash.dtype == np.uint64, f"{embeddings_hash.dtype=}"
embeddings_hash = embeddings_hash.view(np.uint8) # shape = (batch_size * 8,)
embeddings_hash = embeddings_hash.reshape(configuration["batch_size"], 8)
embeddings_hash = np.unpackbits(embeddings_hash, axis=1)
updates = [
{
"b_id": message.id,
"message_hash": f"B'{Bit(embedding_hash.reshape(-1)).to_text()}'".encode("ascii")
}
for message, embedding_hash in zip(batch, embeddings_hash)
]
await session.execute(
Messages.__table__.update()
.where(Messages.id == bindparam("b_id"))
.values({"message_hash": bindparam("message_hash")})
.execution_options(synchronize_session=False),
updates
)
Expected Behavior
I want to store a 64-bit value in the BIT(64)
column without errors.
Actual Behavior
PostgreSQL complains about the bit string length not matching BIT(64)
, or asking bytes-like object.
Environment
- Python 3.12.3
- PostgreSQL 17
- SQLAlchemy==2.0.36 (using async)
- asyncpg==0.30.0
- pgvector==0.3.5
Metadata
Metadata
Assignees
Labels
No labels