Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug] DenseVector bit type support for 8.16 #1946

Closed
pySilver opened this issue Nov 22, 2024 · 3 comments · Fixed by #1948
Closed

[bug] DenseVector bit type support for 8.16 #1946

pySilver opened this issue Nov 22, 2024 · 3 comments · Fixed by #1948
Assignees
Labels
Area: Client Manually written code that fits in no other area Category: Bug Something isn't right Priority: High

Comments

@pySilver
Copy link

Here is a valid example of bit dense_vector field with element_type = 'bit' where values are hex:

PUT /my-index
{
  "mappings": {
    "properties": {
      "my_vector": {
        "type": "dense_vector",
        "element_type": "bit",
        "dims": 256
      }
    }
  }
}

PUT /my-index/_doc/c1.jpg
{"my_vector": "eb80b56a847f4a957fa0b56ac05fdaad16ac6b522d43952cc0de6ab53fa0894a"}

This type become available in 8.16 release of ES, I believe.

So I'm getting serialization error when trying to use that vector type:

class ImageFeatures(InnerDoc):
    phash_vector = DenseVector(
        dims=64,
        element_type="bit",
        required=True,
    )

Error (shown when validation is enabled):

File "/Users/Silver/Projects/GitHub/mybaze/mybaze/feeds/services.py", line 705, in products_sync_to_elasticsearch
    return await ProductDocument.bulk(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/Silver/Projects/GitHub/mybaze/.venv/lib/python3.12/site-packages/elasticsearch_dsl/_async/document.py", line 521, in bulk
    return await async_bulk(es, Generate(actions), **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/Silver/Projects/GitHub/mybaze/.venv/lib/python3.12/site-packages/elasticsearch/_async/helpers.py", line 346, in async_bulk
    async for ok, item in async_streaming_bulk(
  File "/Users/Silver/Projects/GitHub/mybaze/.venv/lib/python3.12/site-packages/elasticsearch/_async/helpers.py", line 237, in async_streaming_bulk
    async for bulk_data, bulk_actions in _chunk_actions(
  File "/Users/Silver/Projects/GitHub/mybaze/.venv/lib/python3.12/site-packages/elasticsearch/_async/helpers.py", line 79, in _chunk_actions
    async for action, data in actions:
  File "/Users/Silver/Projects/GitHub/mybaze/.venv/lib/python3.12/site-packages/elasticsearch/_async/helpers.py", line 225, in map_actions
    async for item in aiter(actions):
  File "/Users/Silver/Projects/GitHub/mybaze/.venv/lib/python3.12/site-packages/elasticsearch_dsl/_async/document.py", line 515, in __anext__
    doc.full_clean()
  File "/Users/Silver/Projects/GitHub/mybaze/.venv/lib/python3.12/site-packages/elasticsearch_dsl/utils.py", line 642, in full_clean
    self.clean_fields(validate=False)
  File "/Users/Silver/Projects/GitHub/mybaze/.venv/lib/python3.12/site-packages/elasticsearch_dsl/utils.py", line 628, in clean_fields
    data = field.clean(data)
           ^^^^^^^^^^^^^^^^^
  File "/Users/Silver/Projects/GitHub/mybaze/.venv/lib/python3.12/site-packages/elasticsearch_dsl/field.py", line 264, in clean
    data = super().clean(data)
           ^^^^^^^^^^^^^^^^^^^
  File "/Users/Silver/Projects/GitHub/mybaze/.venv/lib/python3.12/site-packages/elasticsearch_dsl/field.py", line 148, in clean
    data = self.deserialize(data)
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/Silver/Projects/GitHub/mybaze/.venv/lib/python3.12/site-packages/elasticsearch_dsl/field.py", line 138, in deserialize
    None if d is None else self._deserialize(d)
                           ^^^^^^^^^^^^^^^^^^^^
  File "/Users/Silver/Projects/GitHub/mybaze/.venv/lib/python3.12/site-packages/elasticsearch_dsl/field.py", line 249, in _deserialize
    return self._wrap(data)
           ^^^^^^^^^^^^^^^^
  File "/Users/Silver/Projects/GitHub/mybaze/.venv/lib/python3.12/site-packages/elasticsearch_dsl/field.py", line 226, in _wrap
    return self._doc_class.from_es(data, data_only=True)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/Silver/Projects/GitHub/mybaze/.venv/lib/python3.12/site-packages/elasticsearch_dsl/document_base.py", line 379, in from_es
    return super().from_es(data)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/Silver/Projects/GitHub/mybaze/.venv/lib/python3.12/site-packages/elasticsearch_dsl/utils.py", line 561, in from_es
    doc._from_dict(data)
  File "/Users/Silver/Projects/GitHub/mybaze/.venv/lib/python3.12/site-packages/elasticsearch_dsl/utils.py", line 568, in _from_dict
    v = f.deserialize(v)
        ^^^^^^^^^^^^^^^^
  File "/Users/Silver/Projects/GitHub/mybaze/.venv/lib/python3.12/site-packages/elasticsearch_dsl/field.py", line 144, in deserialize
    return self._deserialize(data)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/Silver/Projects/GitHub/mybaze/.venv/lib/python3.12/site-packages/elasticsearch_dsl/field.py", line 249, in _deserialize
    return self._wrap(data)
           ^^^^^^^^^^^^^^^^
  File "/Users/Silver/Projects/GitHub/mybaze/.venv/lib/python3.12/site-packages/elasticsearch_dsl/field.py", line 226, in _wrap
    return self._doc_class.from_es(data, data_only=True)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/Silver/Projects/GitHub/mybaze/.venv/lib/python3.12/site-packages/elasticsearch_dsl/document_base.py", line 379, in from_es
    return super().from_es(data)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/Silver/Projects/GitHub/mybaze/.venv/lib/python3.12/site-packages/elasticsearch_dsl/utils.py", line 561, in from_es
    doc._from_dict(data)
  File "/Users/Silver/Projects/GitHub/mybaze/.venv/lib/python3.12/site-packages/elasticsearch_dsl/utils.py", line 568, in _from_dict
    v = f.deserialize(v)
        ^^^^^^^^^^^^^^^^
  File "/Users/Silver/Projects/GitHub/mybaze/.venv/lib/python3.12/site-packages/elasticsearch_dsl/field.py", line 144, in deserialize
    return self._deserialize(data)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/Silver/Projects/GitHub/mybaze/.venv/lib/python3.12/site-packages/elasticsearch_dsl/field.py", line 389, in _deserialize
    return float(data)
           ^^^^^^^^^^^
@miguelgrinberg
Copy link
Collaborator

Ah, yes, the DenseVector class in this package is designed to represent a list of floating point numbers, it isn't going to work as anything else.

Let me think about how to best represent the new dense vector, we may need to add a separate class for them, since the type definitions in this package aren't as flexible as the ones Elasticsearch uses server-side.

@miguelgrinberg miguelgrinberg self-assigned this Nov 22, 2024
@miguelgrinberg miguelgrinberg added Category: Bug Something isn't right Area: Client Manually written code that fits in no other area Priority: High labels Nov 22, 2024
@miguelgrinberg
Copy link
Collaborator

This is now available in the 8.17.0 release. You can pass element_type="bit" or element_type="byte" when you declare a DenseVector field. For bit vectors the data type is a string with a hex representation. For byte vectors the data type is a list of integers.

@pySilver
Copy link
Author

@miguelgrinberg Thank You! Awesome work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: Client Manually written code that fits in no other area Category: Bug Something isn't right Priority: High
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants