Skip to content

Cannot Insert or Update Data into a bit(64) in SQLAlchemy #112

@ivanstepanovftw

Description

@ivanstepanovftw

I have a table "messages" (model shown below) with a BIT(64) column in PostgreSQL, and I'm trying to insert or update data into that column using SQLAlchemy (via pgvector-python). However, I'm getting an error about the bit string length not matching BIT(64). I've tried all of the following formats for the data:

  • Bit(embedding) (expects bytes-like object)
  • "010101" (expects bytes-like object)
  • b"010101" (bit string length 528 does not match type bit(64))
  • b'B"010101"' (bit string length 536 does not match type bit(64))

But no matter which approach I take, I see an error similar to:

ERROR:__main__:Error processing batch: (sqlalchemy.dialects.postgresql.asyncpg.Error) <class 'asyncpg.exceptions.StringDataLengthMismatchError'>: bit string l
ength 536 does not match type bit(64)                                                                                                                         
[SQL: UPDATE messages SET message_hash=$1, updated_at=now() WHERE messages.id = $2::BIGINT]                                                 
[parameters: [(b"B'0000100101000100000101001100001100011011010111000111101111111110'", 23189), (b"B'1000100001000100000101001100011100010011001110001101101111
011110'", 724950), (b"B'1011011100000100010001111110101111001011111011010110101111011010'", 103289), (b"B'1001101100010100010001100100001101011011001111011100
101111111010'", 90)]]                                                                                                                                         
(Background on this error at: https://sqlalche.me/e/20/dbapi)                                                                                                 
ERROR:__main__:Fatal error: (sqlalchemy.dialects.postgresql.asyncpg.Error) <class 'asyncpg.exceptions.StringDataLengthMismatchError'>: bit string length 536 d
oes not match type bit(64)

or

sqlalchemy.exc.DBAPIError: (sqlalchemy.dialects.postgresql.asyncpg.Error) <class 'asyncpg.exceptions.StringDataLengthMismatchError'>: bit string length 528 does not match type bit(64)
[SQL: UPDATE messages SET message_hash=$1, updated_at=now() WHERE messages.id = $2::BIGINT]
[parameters: [(b"'0000100101000100000101001100001100011011010111000111101111111110'", 23189), (b"'1000100001000100000101001100011100010011001110001101101111011110'", 724950), (b"'1011011100000100010001111110101111001011111011010110101111011010'", 103289), (b"'1001101100010100010001100100001101011011001111011100101111111010'", 90)]]
(Background on this error at: https://sqlalche.me/e/20/dbapi)

What I Tried

  1. Just giving a plain string of bits (e.g., "010101").
  2. Supplying the value in a B'...' literal form.
  3. Supplying raw bytes, like b"010101".
  4. Using a Bit object from pgvector.utils.Bit.

Nothing seems to work, and I get the mismatch length error.


Snippet of the Model

from pgvector.sqlalchemy import BIT
from pgvector.utils import Bit

class Messages(Base):
    __tablename__ = "messages"
    id: Mapped[int] = mapped_column(BigInteger, primary_key=True)

    # other columns...

    message_hash: Mapped[np.ndarray | None] = mapped_column(BIT(64), nullable=True)

Code Snippet

assert embeddings_hash.ndim == 1, f"{embeddings_hash.shape=}"
assert embeddings_hash.shape[0] == len(batch), f"{embeddings_hash.shape=} {len(batch)=}"
assert embeddings_hash.dtype == np.uint64, f"{embeddings_hash.dtype=}"

embeddings_hash = embeddings_hash.view(np.uint8)  # shape = (batch_size * 8,)
embeddings_hash = embeddings_hash.reshape(configuration["batch_size"], 8)
embeddings_hash = np.unpackbits(embeddings_hash, axis=1) 

updates = [
    {
        "b_id": message.id,
        "message_hash": f"B'{Bit(embedding_hash.reshape(-1)).to_text()}'".encode("ascii")
    }
    for message, embedding_hash in zip(batch, embeddings_hash)
]

await session.execute(
    Messages.__table__.update()
    .where(Messages.id == bindparam("b_id"))
    .values({"message_hash": bindparam("message_hash")})
    .execution_options(synchronize_session=False),
    updates
)

Expected Behavior
I want to store a 64-bit value in the BIT(64) column without errors.


Actual Behavior
PostgreSQL complains about the bit string length not matching BIT(64), or asking bytes-like object.


Environment

  • Python 3.12.3
  • PostgreSQL 17
  • SQLAlchemy==2.0.36 (using async)
  • asyncpg==0.30.0
  • pgvector==0.3.5

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions