You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
#532 introduced support for the Weaviate vector database to dlt. While the support allows users to include specific fields into a vector index and lets Weaviate generate embeddings for data, there is a limitation when dealing with large content. Oversized data requires chunking before it's submitted to Weaviate for processing.
Objective
To provide a more seamless integration with Weaviate, we need to add a transformer that can chunk the data into manageable sizes. This transformer should be flexible, allowing users to define the chunking strategy based on specific heuristics.
Tasks
Develop a transformer function that accepts input data and returns it in chunked form.
The transformer should have an interface that allows it to accept a custom function, which will define the chunking strategy.
Integrate functionality similar to the text splitters from LangChain which can provide heuristic-based content splitting.
Include sample heuristics or functions that developers can use or customize for their chunking needs.
Update the docs to explain how to use the chunking transformer.
Tests
Unit Tests for the transformer
Tests the integration with Weaviate destination
The text was updated successfully, but these errors were encountered:
Background
#532 introduced support for the Weaviate vector database to dlt. While the support allows users to include specific fields into a vector index and lets Weaviate generate embeddings for data, there is a limitation when dealing with large content. Oversized data requires chunking before it's submitted to Weaviate for processing.
Objective
To provide a more seamless integration with Weaviate, we need to add a transformer that can chunk the data into manageable sizes. This transformer should be flexible, allowing users to define the chunking strategy based on specific heuristics.
Tasks
Tests
The text was updated successfully, but these errors were encountered: