Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add OctoAI as embedding option in Vector CDK and specifically in Pinecone #4

Open
aaronsteers opened this issue May 16, 2024 · 5 comments

Comments

@aaronsteers
Copy link

aaronsteers commented May 16, 2024

Summary

OctoAI is an embedding service focused on high-volume, high-throughput workloads.

Project Description

This feature addition would add OctoAI as a named option - leveraging the base OpenAI-compatible implementation and customizing rate limit to match OctoAI's higher limits.

Definition of Done

Resources to Assist

@avirajsingh7
Copy link

avirajsingh7 commented Jun 8, 2024

@aaronsteers this is first time I am working with this
Octa AI max input is 8191 (check) and we can increase the chunk_size in LocalAIEmbeddings default is 1000 to 5000 or 6000,

I am not sure about implementation, is this right or not

https://github.com/airbytehq/airbyte/blob/f0c85c7d398146654434643c5980adddec51ff69/airbyte-cdk/python/airbyte_cdk/destinations/vector_db_based/embedder.py#L164

`
OCTA_AI_API_BASE = "https://text.octoai.run/v1"

class OctaAIEmbedder(OpenAICompatibleEmbedder):

def __init__(self, config: OctaAICompatibleEmbeddingConfigModel):

    super().__init__()
    self.config = config

    self.embeddings = LocalAIEmbeddings(model=config.model_name, openai_api_key=config.api_key or "dummy-api-key", openai_api_base=OCTA_AI_API_BASE,chunk_size=5000, max_retries=15, disallowed_special=())

`

@marcosmarxm
Copy link
Member

@avirajsingh7 do you want to be assigned to this issue?

@avirajsingh7
Copy link

avirajsingh7 commented Jun 10, 2024

@marcosmarxm I can work on this issue; can somebody confirm the approach?

Octa AI has a maximum input of 8191 (check. We can change the chunk_size in LocalAIEmbeddings from the default 1000 to 5000 or 6000.

We can either provide a new option for octaAi or provide a chunk_size option to OpenAICompatibleEmbedder that users can set in airbyteUI.
or we can use unstructured for embeddings

@btkcodedev
Copy link

@marcosmarxm, @aaronsteers I would like to take up this issue, raised PR for #11

@bindipankhudi
Copy link
Contributor

@btkcodedev - we need to discuss internally if we want this item worked on. Please feel free to pick another item while we figure this out. #5 is a similar item which could be interesting to you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants