Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RAG with GPT-4o: Calculated available context size -271 was not non-negative LlamaIndex exception. #1372

Closed
Trawczynski opened this issue Jun 28, 2024 · 5 comments

Comments

@Trawczynski
Copy link

Bug description
Hi, I have been struggling trying to run RAG using GPT-4o in the v0.8.1 of MetaGPT.
When I run the first code example, it following error occurs:

{
	"name": "ValueError",
	"message": "Calculated available context size -271 was not non-negative.",
	"stack": "---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[5], line 1
----> 1 response = engine.query(\"What does Bob like?\")
      2 response

File /mnt/c/Users/alant/Downloads/agents_tests/MetaGPT/.venv/lib/python3.10/site-packages/llama_index/core/base/base_query_engine.py:40, in BaseQueryEngine.query(self, str_or_query_bundle)
     38 if isinstance(str_or_query_bundle, str):
     39     str_or_query_bundle = QueryBundle(str_or_query_bundle)
---> 40 return self._query(str_or_query_bundle)

File /mnt/c/Users/alant/Downloads/agents_tests/MetaGPT/.venv/lib/python3.10/site-packages/llama_index/core/query_engine/retriever_query_engine.py:187, in RetrieverQueryEngine._query(self, query_bundle)
    183 with self.callback_manager.event(
    184     CBEventType.QUERY, payload={EventPayload.QUERY_STR: query_bundle.query_str}
    185 ) as query_event:
    186     nodes = self.retrieve(query_bundle)
--> 187     response = self._response_synthesizer.synthesize(
    188         query=query_bundle,
    189         nodes=nodes,
    190     )
    192     query_event.on_end(payload={EventPayload.RESPONSE: response})
    194 return response

File /mnt/c/Users/alant/Downloads/agents_tests/MetaGPT/.venv/lib/python3.10/site-packages/llama_index/core/response_synthesizers/base.py:188, in BaseSynthesizer.synthesize(self, query, nodes, additional_source_nodes, **response_kwargs)
    183     query = QueryBundle(query_str=query)
    185 with self._callback_manager.event(
    186     CBEventType.SYNTHESIZE, payload={EventPayload.QUERY_STR: query.query_str}
    187 ) as event:
--> 188     response_str = self.get_response(
    189         query_str=query.query_str,
    190         text_chunks=[
    191             n.node.get_content(metadata_mode=MetadataMode.LLM) for n in nodes
    192         ],
    193         **response_kwargs,
    194     )
    196     additional_source_nodes = additional_source_nodes or []
    197     source_nodes = list(nodes) + list(additional_source_nodes)

File /mnt/c/Users/alant/Downloads/agents_tests/MetaGPT/.venv/lib/python3.10/site-packages/llama_index/core/response_synthesizers/compact_and_refine.py:37, in CompactAndRefine.get_response(self, query_str, text_chunks, prev_response, **response_kwargs)
     33 \"\"\"Get compact response.\"\"\"
     34 # use prompt helper to fix compact text_chunks under the prompt limitation
     35 # TODO: This is a temporary fix - reason it's temporary is that
     36 # the refine template does not account for size of previous answer.
---> 37 new_texts = self._make_compact_text_chunks(query_str, text_chunks)
     38 return super().get_response(
     39     query_str=query_str,
     40     text_chunks=new_texts,
     41     prev_response=prev_response,
     42     **response_kwargs,
     43 )

File /mnt/c/Users/alant/Downloads/agents_tests/MetaGPT/.venv/lib/python3.10/site-packages/llama_index/core/response_synthesizers/compact_and_refine.py:52, in CompactAndRefine._make_compact_text_chunks(self, query_str, text_chunks)
     49 refine_template = self._refine_template.partial_format(query_str=query_str)
     51 max_prompt = get_biggest_prompt([text_qa_template, refine_template])
---> 52 return self._prompt_helper.repack(max_prompt, text_chunks)

File /mnt/c/Users/alant/Downloads/agents_tests/MetaGPT/.venv/lib/python3.10/site-packages/llama_index/core/indices/prompt_helper.py:276, in PromptHelper.repack(self, prompt, text_chunks, padding, llm)
    263 def repack(
    264     self,
    265     prompt: BasePromptTemplate,
   (...)
    268     llm: Optional[LLM] = None,
    269 ) -> List[str]:
    270     \"\"\"Repack text chunks to fit available context window.
    271 
    272     This will combine text chunks into consolidated chunks
    273     that more fully \"pack\" the prompt template given the context_window.
    274 
    275     \"\"\"
--> 276     text_splitter = self.get_text_splitter_given_prompt(
    277         prompt, padding=padding, llm=llm
    278     )
    279     combined_str = \"\
\
\".join([c.strip() for c in text_chunks if c.strip()])
    280     return text_splitter.split_text(combined_str)

File /mnt/c/Users/alant/Downloads/agents_tests/MetaGPT/.venv/lib/python3.10/site-packages/llama_index/core/indices/prompt_helper.py:234, in PromptHelper.get_text_splitter_given_prompt(self, prompt, num_chunks, padding, llm)
    224 def get_text_splitter_given_prompt(
    225     self,
    226     prompt: BasePromptTemplate,
   (...)
    229     llm: Optional[LLM] = None,
    230 ) -> TokenTextSplitter:
    231     \"\"\"Get text splitter configured to maximally pack available context window,
    232     taking into account of given prompt, and desired number of chunks.
    233     \"\"\"
--> 234     chunk_size = self._get_available_chunk_size(
    235         prompt, num_chunks, padding=padding, llm=llm
    236     )
    237     if chunk_size <= 0:
    238         raise ValueError(f\"Chunk size {chunk_size} is not positive.\")

File /mnt/c/Users/alant/Downloads/agents_tests/MetaGPT/.venv/lib/python3.10/site-packages/llama_index/core/indices/prompt_helper.py:218, in PromptHelper._get_available_chunk_size(self, prompt, num_chunks, padding, llm)
    215     prompt_str = get_empty_prompt_txt(prompt)
    216     num_prompt_tokens = self._token_counter.get_string_tokens(prompt_str)
--> 218 available_context_size = self._get_available_context_size(num_prompt_tokens)
    219 result = available_context_size // num_chunks - padding
    220 if self.chunk_size_limit is not None:

File /mnt/c/Users/alant/Downloads/agents_tests/MetaGPT/.venv/lib/python3.10/site-packages/llama_index/core/indices/prompt_helper.py:150, in PromptHelper._get_available_context_size(self, num_prompt_tokens)
    148 context_size_tokens = self.context_window - num_prompt_tokens - self.num_output
    149 if context_size_tokens < 0:
--> 150     raise ValueError(
    151         f\"Calculated available context size {context_size_tokens} was\"
    152         \" not non-negative.\"
    153     )
    154 return context_size_tokens

ValueError: Calculated available context size -271 was not non-negative."
}

This is my configuration file:

llm:
  api_type: "openai"  # or azure / ollama / open_llm etc. Check LLMType for more options
  model: "gpt-4o"  # or gpt-3.5-turbo-1106 / gpt-4-1106-preview
  base_url: "https://api.openai.com/v1"  # or forward url / other llm url
  api_key: "..."

embedding:
  api_type: "openai" # openai / azure / gemini / ollama etc. Check EmbeddingType for more options.
  base_url: "https://api.openai.com/v1"  # or forward url / other llm url
  api_key: "..."
  model: "text-embedding-3-small"
  dimensions: 1536 # output dimension of embedding model

Bug solved method
I have checked the code, and found that it happens because the context size of the gpt-4o model is not defined (this also happens with gpt-4-turbo, which is not so recent) in the metagpt/utils/token_counter.py file. Therefore, the default context size (3900) is used, resulting in this error.

The exception is thrown by LlamaIndex, and is not informative enough to understand what is going on.
This problem should be handled internally by MetaGPT. Adding a context_size field to the configuration file may be useful, as it would allow users to use models that are not yet supported, as well as limit the length of requests sent to the LLM provider (if there was a reason to do it).

@byang1981
Copy link

Try to add "max_token:2048" in config2.yaml file as following. Note: 2048 is int not string.
llm:
api_type: xxx
model: xxx
... ....
max_token: 2048

@Thoams0211
Copy link

Try to add "max_token:2048" in config2.yaml file as following. Note: 2048 is int not string. llm: api_type: xxx model: xxx ... .... max_token: 2048

Uesful Answer!!

@better629
Copy link
Collaborator

Since no further responses are needed, we will close it. Please reopen it if necessary.

@AryanSakhala
Copy link

This issue can also occur when you create an index with embedding model max_token set differently than when you are reloading the indexes from the persist directory with llm model max_token set differently.

Make sure to set the max_token parameter the same, while creating a persistent index and while loading the persistent index to use as query_agent.

@devanshsaini11
Copy link

For me the problem was in PromptHelper, fixed it using Settings._prompt_helper = PromptHelper(context_window=6000), default context window is 3900 tokens

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants