-
Notifications
You must be signed in to change notification settings - Fork 15.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Embedding Seems Unsuccessful for Chroma + OpenAI #7963
Comments
Just found out that this should be an expected behavior. And I have a follow-up question: if there is no vector stored in the DB, how is the similarity calculated (I guess the query is neither embedded when doing the similarity search)? Are we using some default embedding provided by Chroma? |
Answer generated by a 🤖 AnswerI understand that you're puzzled by the 'None' value for the 'embeddings' field when you print the Chroma database collection. You're also curious about how the similarity search works if the embeddings aren't directly stored in the database. Let me clarify this for you. The 'None' value you're seeing is actually expected behavior. The Chroma database doesn't store the embeddings directly. Instead, it keeps a compressed representation of these embeddings. When you print the collection, it shows 'None' for the embeddings because the actual embeddings aren't directly accessible. As for the similarity search, it operates by comparing these compressed representations. When you execute a similarity search, Chroma decompresses the stored representations to compute the similarity scores. So, even though you don't see the embeddings when you print the collection, rest assured they are there in a compressed form and are utilized for similarity searches. This is just one possible explanation based on the information you've provided and my understanding of the Chroma database. If you find this explanation accurate and feel it could be beneficial to others, I encourage you to make a pull request to update the documentation. This way, other users who encounter the same confusion can quickly find the answer. SourcesThis response is meant to be useful, save you time, and share context. It is not meant to be a precise solution, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. |
@kong75 is it possible to provide dummy/masked data which has been used here or to reproduce the issue? |
@rjarun8 Thank you for replying! I think the response above from dosu has answered my question. I just did not know that embeddings are not available when printing by default. |
Embeddings are not available when printed by default. |
See: https://docs.trychroma.com/troubleshooting Embeddings are not passed by default. |
System Info
openai==0.27.7
langchain==0.0.237
chromadb==0.4.2
Platform: Windows 11
Python Version: 3.10
Who can help?
No response
Information
Related Components
Reproduction
Within this file, I was expecting db_collection to have embeddings when it was printed. However, the output is like this:
The value for "embeddings" is empty.
Here is the code:
Expected behavior
The embedding is done successfully and could be shown in logs. Thank you!
The text was updated successfully, but these errors were encountered: