-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: KeyError in retriever.retrieve when sent index does not include all nodes in document store #7684
Comments
🤖 Hi Guy, Thank you for bringing this issue to our attention. It seems like you've encountered a KeyError because the retriever is trying to process nodes that are not covered by the specific index you've provided. This is a known issue and a similar problem was addressed in a previous issue #1769. The solution suggested in that issue was to pass the list of node ids from the vector index to the retriever, so it knows which subset of nodes to process. Here's the code snippet from that solution: def as_retriever(self, **kwargs: Any) -> BaseRetriever:
# NOTE: lazy import
from llama_index.indices.vector_store.retrievers import VectorIndexRetriever
return VectorIndexRetriever(
self, doc_ids=list(self.index_struct.nodes_dict.values()), **kwargs
) This code ensures that the retriever only processes nodes that are included in the provided index, which should prevent the KeyError you're experiencing. However, please note that this is more of a workaround than a permanent solution. The underlying issue is that the retrieve function does not verify that only nodes covered by the provided index are processed. We will consider addressing this in a future update. As for your question about changes made in the LlamaIndex codebase after version 0.8.5.post2 that could potentially address this issue, I wasn't able to find any. However, I recommend updating to the latest version (v0.8.27) to benefit from the most recent bug fixes and improvements. I hope this helps! Let me know if you have any other questions. Best, Sources
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot. |
The bot is right! If you aren't using |
@logan-markewich I am facing this isssue and I added the code snippet provided by the bot in retriever.py but its still giving me this error. This is my code. from llama_index.llms import OpenAI documents = SimpleDirectoryReader( embed_model = TextEmbeddingsInference( llm = OpenAI(temperature=0, model="gpt-3.5-turbo", max_tokens=1024) |
Bug Description
Hi,
I have created multiple indexes, each covering different set of nodes. I am then using one of the indexes to retrieve nodes based on a query. However, it seems that the retriever is trying to process nodes that are out of the context of the specific index that I have provided, causing a KeyError exception in this line in llama_index/indices/vector_store/retrievers/retriever.py:
self._index.index_struct.nodes_dict[idx] for idx in query_result.ids
I noticed that it is possible to send list of node_ids to retrieve in order to constraint it. Haven't tried it yet and I guess that it will work. However, I believe that it is better if the retrieve function itself verifies that only nodes that are covered by the provided index are processed.
Thanks!
Guy
Version
0.8.5.post2
Steps to Reproduce
If the query returns nodes that are excluded from the provided index, KeyError exception occurs
Relevant Logs/Tracbacks
No response
The text was updated successfully, but these errors were encountered: