Skip to content

keboola/app-embeddings

 
 

Repository files navigation

Embedding Transformation

This component allows you to embed tables using OpenAI embedding algorithms with data provided from your KBC project.

  • [TOC]

Configuration

Parameters:

AI Service Provider: OpenAI

Other options:

  • Column to Embed (embed_column): Specify the column that contains the text data to be embedded.
  • Embedding Model (model): The model that will generate the embeddings. Choose from:
    • text-embedding-3-small
    • text-embedding-3-large
    • text-embedding-ada-002 Learn more.
  • Output Format (output_format): Determines if embeddings will be sent to a zipped Lance file or to a Keboola Table (CSV).
  • Incremental Load (incremental load): If enabled, the table will update instead of being overwritten.
  • Output Table Name (output_table_name)
  • Primary Keys (primary_keys):

Component Configuration Example

Generic configuration

{
  "#apiKey": "your-openai-api-key",
  "model": "ada_002",
  "embedColumn": "description",
  "outputFormat": "csv"
}

This configuration uses the ada_002 model to embed the description column and outputs the result in CSV format.

Row configuration

{
  "embedColumn": "title",
  "destination": {
    "output_table_name": "embedded_table",
    "incremental_load": true,
    "primary_keys": "id"
  }
}

In this example, the embedding column is set to title, and the results are stored in the embedded_table with incremental loading enabled. The primary key is the id column.


Development

If required, change the local data folder (the CUSTOM_FOLDER placeholder) path to your custom path in the docker-compose.yml file:

volumes:
  - ./:/code
  - ./CUSTOM_FOLDER:/data

Clone this repository, initialize the workspace, and run the component with the following command:

git clone git@github.com:keboola/app-transformation-lanceDB-embeddings.git
cd app-transformation-lanceDB-embeddings
docker-compose build
docker-compose run --rm dev

Run the test suite and lint check using this command:

docker-compose run --rm test

Integration

For information about deployment and integration with KBC, please refer to the deployment section of developers documentation.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 48.0%
  • Shell 33.7%
  • PowerShell 17.3%
  • Dockerfile 1.0%