Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to use api createContext #30

Open
doanhieu9797 opened this issue Apr 3, 2023 · 10 comments
Open

How to use api createContext #30

doanhieu9797 opened this issue Apr 3, 2023 · 10 comments

Comments

@doanhieu9797
Copy link

I’m using your embedbase library on python. I wonder how can I use the createContext api like on the embedbase.xyz playground menu. By the way embedbasd is very helpful to me. Thank you very much.

@louis030195
Copy link
Contributor

louis030195 commented Apr 4, 2023

I’m using your embedbase library on python. I wonder how can I use the createContext api like on the embedbase.xyz playground menu. By the way embedbasd is very helpful to me. Thank you very much.

@doanhieu9797 thanks a lot!

Sure! Here's the "createContext" function from the JS SDK translated to Python:

import requests
import json

def create_context(embedbase_api_url, embedbase_key, dataset, query, options=None):
    if options is None:
        options = {}
    limit = options.get("limit", 5)
    headers = {
        "Content-Type": "application/json",
        "Authorization": f"Bearer {embedbase_key}"
    }
    search_url = f"{embedbase_api_url}/{dataset}/search"
    
    response = requests.post(search_url, headers=headers, data=json.dumps({"query": query, "top_k": limit}))
    data = response.json()

    return [similarity["data"] for similarity in data["similarities"]]

# Usage
embedbase_api_url = "https://your_embedbase_url/v1"
embedbase_key = "your_embedbase_key"
dataset = "your_dataset_name"
query = "your_query"

context_data = create_context(embedbase_api_url, embedbase_key, dataset, query, options={"limit": 5})
print(context_data)

Just replace "your_embedbase_url", "your_embedbase_key", "your_dataset_name", and "your_query" with the appropriate values.

FYI, we plan to open source the playground soon, so you can take a look at the code, and turn it into a one-line installable React component,
Can you tell me how this code helps you, please?

If you have prompt size issue, don't hesitate to follow-up, I can help with that part too

@doanhieu9797
Copy link
Author

Thank. Great if you open source the playground soon, I think I'll make an AI app to read and analyze my own docs.

@doanhieu9797
Copy link
Author

By the way can you please tell me if using splitText to shorten the context will affect chatgpt's answer?

@louis030195
Copy link
Contributor

@doanhieu9797 cool! We actually created a documentation connected to GPT-4
This is the script that send all documentation files to Embedbase
https://github.com/different-ai/embedbase-docs/blob/main/scripts/sync.ts
at every git push on main branch https://github.com/different-ai/embedbase-docs/blob/main/.github/workflows/index.yaml

image

https://docs.embedbase.xyz/

@louis030195
Copy link
Contributor

louis030195 commented Apr 5, 2023

By the way can you please tell me if using splitText to shorten the context will affect chatgpt's answer?

We are experimenting with different way of splitting the text. The biggest value is to avoid going over the prompt size. How you split the text depends on the experience you want to create.

For example, if the user asks a question to a documentation "how can I run an eth node on raspberry pi?" it will search for similar information in the chosen dataset(s) and feed it to GPT. I recommend experimenting different splitting size and see what's best for you. We want to create an easy numerical way to evaluate different strategies in the future

@doanhieu9797
Copy link
Author

@louis030195 cool! Many thanks. I'm back after busy days, i am wondering how to update a data of documents imported into dataset_ids can you please let me know.

@benjaminshafii
Copy link
Member

benjaminshafii commented Apr 12, 2023

@doanhieu9797

we don't support updating data at the moment - we're append only.

our recommendation is to create a new dataset and later then query only the new dataset with the updated data. under the hood, we make sure that re-creating datasets is performant and efficient.

PS: this is because embeddings are rarely retrieved using an ID contrary to SQL and NoSQL DBs, we believe it doesn't make sense to retrieve a single embedding through an id and update it.

PPS: we understand this can be a bit of hassle, and are open to implement this in the future if we hear a compelling use case.

@doanhieu9797
Copy link
Author

@doanhieu9797

we don't support updating data at the moment - we're append only.

our recommendation is to create a new dataset and later then query only the new dataset with the updated data. under the hood, we make sure that re-creating datasets is performant and efficient.

PS: this is because embeddings are rarely retrieved using an ID contrary to SQL and NoSQL DBs, we believe it doesn't make sense to retrieve a single embedding through an id and update it.

PPS: we understand this can be a bit of hassle, and are open to implement this in the future if we hear a compelling use case.

@hotkartoffel I know but this is very necessary because when I import a lot of data into the dataset and I only want to edit one data, I have to delete the whole dataset and re-import it. It's not reasonable at all.

@benjaminshafii
Copy link
Member

@doanhieu9797 we're ready to update our beliefs there.

just a few questions:
a) could you expand a bit about your use of the api (what are you storing, how do you use it, would you mind sharing a sample entry that you store?)
b) how would you like to retrieve & update data in pseudo code

you can also reach me on discord (hotkartoffel.eth#2160) or schedule a call if that makes it any easier

@louis030195
Copy link
Contributor

@doanhieu9797 hey, just added an update endpoint, hope that's helping :)

https://docs.embedbase.xyz/interface#updating-data

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants