Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: max_tokens_per_request error when pushing many documents at once #482

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

kolaente
Copy link
Contributor

Resolves #481

This is the easiest solution I could come up with, but it might not be the most elegant. Ideally, we would know in advance the max number of tokens and make sure the batch does not exceed that. I haven't found a way to get that information using OpenAI's api though, but we could think about parsing the error response and then using that in following requests.

@kolaente
Copy link
Contributor Author

Another option would be to add a new config parameter max_tokens_per_batch and change this line

https://github.com/timescale/pgai/blob/main/projects/pgai/pgai/vectorizer/embeddings.py#L72

to use that.

@kolaente kolaente force-pushed the fix/halve-documents-when-too-many-tokens branch from 1e83b35 to 6eabc74 Compare February 14, 2025 12:16
@kolaente kolaente force-pushed the fix/halve-documents-when-too-many-tokens branch from 6eabc74 to 08c7a1e Compare February 14, 2025 12:19
@cevian
Copy link
Collaborator

cevian commented Feb 14, 2025

@kolaente what do you think of hardcoding a 600000 limit for now?

@kolaente
Copy link
Contributor Author

A hardcoded limit would help me a lot! I just thought there had to be something more elegant.

I was unable to find what that limit determines, if it depends on the openai tier or is a general limit or something else entirely

@cevian
Copy link
Collaborator

cevian commented Feb 14, 2025

yeah I couldn't find it documented either. But, It's a high-enough limit I don't mind hardcoding for now.

The way I'd implement is to add an optional parameter to BatchApiCaller for max_tokens_per_batch and change the logic to obey both max_chunks_per_batch AND max_tokens_per_batch when the latter is set. Then for OpenAI I'd set max_tokens_per_batch to 600000.

@kolaente
Copy link
Contributor Author

@cevian Implemented the limit, please have a look

return BatchApiCaller(self._max_chunks_per_batch(), self.call_embed_api)
return BatchApiCaller(
self._max_chunks_per_batch(),
600_000, # See https://github.com/timescale/pgai/pull/482#issuecomment-2659799567
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WDYT about making this part of the Embedder abstract class, as it is the max_chunks_per_batch. A function like max_tokens_per_batch could work, so we can additionally add such a limit in other providers (if at some point we find them).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like a good idea, will do that. I can't make the function abstract though, since that would mean implementing it everywhere which at this point does not make sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, please check again.

api_callable: Callable[[list[T]], Awaitable[EmbeddingResponse]]
encoder: tiktoken.Encoding | None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tiktoken is exclusively from OpenAI. What about creating an encoder interface instead? Other embedders could implement if necessary.
At the end, the signature could be just a function where the input is a string and returns an array of tokens. Something like Encoder = Callable[[str], list[int]] or Callable[[str, ...], list[int]] if we want to support kwargs.

In that way, you won't call self.encoder.encode_ordinary(chunk) later but self.encoder.encode(chunk) instead

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored it, please check again.

I'm unsure if that's the way to go, python isn't my strong suit 🙃

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants