🐛 Bug Report: Inconsistency in recorded data across different vector databases #1870

LakshmiN5 · 2024-08-19T12:48:55Z

Which component is this bug for?

All Packages

📜 Description

Tried using Traceloop version 0.26.4 with different vector databases while trying to run a RAG application (watsonx + langchain) and observed some difference in behaviour. I was expecting the behaviour across all vector dbs to be uniform with regards to the span information. I tested using Milvus, Pinecone and Chroma where Milvus and Chroma were both tested using in memory option with langchain and for pinecone I tried the managed instance.
Observations :

Chroma - captures less information in the vector db related spans - captures embedding count , gives similarity value but does not give all 4 retrieved chunks information. It just returns one chunk and specifying the parameters in the as_retriever() method does not seem to have any effect on the span information collected. reference for as_retriever: https://api.python.langchain.com/en/v0.1/vectorstores/langchain_astradb.vectorstores.AstraDBVectorStore.html#langchain_astradb.[…]Store.as_retriever
Pinecone - Does not capture the embedding count nor does it give similarity value but I could see the top 4 retrieved documents as part of another span.
Milvus - does not capture the embedding count nor does it give the similarity value , there seems to be some problem with the retrieved context also.

👟 Reproduction steps

Steps can be reproduced by trying the RAG sample from langchain using different vector databases. LLM used is from watsonx with langchain framework.

https://python.langchain.com/v0.1/docs/use_cases/question_answering/quickstart/

👍 Expected behavior

Ideally the following information should be captured consistently across all vector dbs

embeddings details - such as count (for the stored knowledge base) and any additional information
query embeddings and other details
retrieved context information - no of chunks matched, should return all matched chunks as per the configuration parameters set for the retriever (mentioned in number 5)
retrieval parameters configured should influence the actual results generated , for eg : similarity algorithm to use for searching query against the stored docs, number of documents to retrieve , similarity threshold etc
any insights on the chunk(s) used for the final answer generation.

👎 Actual Behavior with Screenshots

Most information is missing and the behaviour is definitely not consistent.

🤖 Python Version

3.10

📃 Provide any additional context for the Bug.

No response

👀 Have you spent some time to check if this bug has been raised before?

I checked and didn't find similar issue

Are you willing to submit PR?

None

LakshmiN5 · 2024-08-19T14:19:11Z

An update to the Expected Behaviour point 1 - We should be able to capture the embedding model information as well. thank you.

cu8code · 2024-09-22T14:24:56Z

@nirga is this issue open, would like to work on this ?

nirga · 2024-09-22T14:25:51Z

Yes @cu8code!

nirga added good first issue Good for newcomers help wanted Extra attention is needed labels Aug 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🐛 Bug Report: Inconsistency in recorded data across different vector databases #1870

🐛 Bug Report: Inconsistency in recorded data across different vector databases #1870

LakshmiN5 commented Aug 19, 2024

LakshmiN5 commented Aug 19, 2024

cu8code commented Sep 22, 2024

nirga commented Sep 22, 2024

🐛 Bug Report: Inconsistency in recorded data across different vector databases #1870

🐛 Bug Report: Inconsistency in recorded data across different vector databases #1870

Comments

LakshmiN5 commented Aug 19, 2024

Which component is this bug for?

📜 Description

👟 Reproduction steps

👍 Expected behavior

👎 Actual Behavior with Screenshots

🤖 Python Version

📃 Provide any additional context for the Bug.

👀 Have you spent some time to check if this bug has been raised before?

Are you willing to submit PR?

LakshmiN5 commented Aug 19, 2024

cu8code commented Sep 22, 2024

nirga commented Sep 22, 2024