-
Notifications
You must be signed in to change notification settings - Fork 215
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: max_tokens_per_request error when pushing many documents at once #482
base: main
Are you sure you want to change the base?
fix: max_tokens_per_request error when pushing many documents at once #482
Conversation
Another option would be to add a new config parameter to use that. |
1e83b35
to
6eabc74
Compare
6eabc74
to
08c7a1e
Compare
@kolaente what do you think of hardcoding a 600000 limit for now? |
A hardcoded limit would help me a lot! I just thought there had to be something more elegant. I was unable to find what that limit determines, if it depends on the openai tier or is a general limit or something else entirely |
yeah I couldn't find it documented either. But, It's a high-enough limit I don't mind hardcoding for now. The way I'd implement is to add an optional parameter to |
@cevian Implemented the limit, please have a look |
return BatchApiCaller(self._max_chunks_per_batch(), self.call_embed_api) | ||
return BatchApiCaller( | ||
self._max_chunks_per_batch(), | ||
600_000, # See https://github.com/timescale/pgai/pull/482#issuecomment-2659799567 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WDYT about making this part of the Embedder abstract class, as it is the max_chunks_per_batch
. A function like max_tokens_per_batch
could work, so we can additionally add such a limit in other providers (if at some point we find them).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds like a good idea, will do that. I can't make the function abstract though, since that would mean implementing it everywhere which at this point does not make sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, please check again.
api_callable: Callable[[list[T]], Awaitable[EmbeddingResponse]] | ||
encoder: tiktoken.Encoding | None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tiktoken is exclusively from OpenAI. What about creating an encoder interface instead? Other embedders could implement if necessary.
At the end, the signature could be just a function where the input is a string and returns an array of tokens. Something like Encoder = Callable[[str], list[int]]
or Callable[[str, ...], list[int]]
if we want to support kwargs.
In that way, you won't call self.encoder.encode_ordinary(chunk)
later but self.encoder.encode(chunk)
instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactored it, please check again.
I'm unsure if that's the way to go, python isn't my strong suit 🙃
Resolves #481
This is the easiest solution I could come up with, but it might not be the most elegant. Ideally, we would know in advance the max number of tokens and make sure the batch does not exceed that. I haven't found a way to get that information using OpenAI's api though, but we could think about parsing the error response and then using that in following requests.