fix: max_tokens_per_request error when pushing many documents at once #482

kolaente · 2025-02-14T11:14:23Z

Resolves #481

This is the easiest solution I could come up with, but it might not be the most elegant. Ideally, we would know in advance the max number of tokens and make sure the batch does not exceed that. I haven't found a way to get that information using OpenAI's api though, but we could think about parsing the error response and then using that in following requests.

kolaente · 2025-02-14T12:16:01Z

Another option would be to add a new config parameter max_tokens_per_batch and change this line

https://github.com/timescale/pgai/blob/main/projects/pgai/pgai/vectorizer/embeddings.py#L72

to use that.

Resolves timescale#481

cevian · 2025-02-14T16:41:46Z

@kolaente what do you think of hardcoding a 600000 limit for now?

kolaente · 2025-02-14T16:56:41Z

A hardcoded limit would help me a lot! I just thought there had to be something more elegant.

I was unable to find what that limit determines, if it depends on the openai tier or is a general limit or something else entirely

cevian · 2025-02-14T17:25:18Z

yeah I couldn't find it documented either. But, It's a high-enough limit I don't mind hardcoding for now.

The way I'd implement is to add an optional parameter to BatchApiCaller for max_tokens_per_batch and change the logic to obey both max_chunks_per_batch AND max_tokens_per_batch when the latter is set. Then for OpenAI I'd set max_tokens_per_batch to 600000.

kolaente · 2025-02-20T16:23:44Z

@cevian Implemented the limit, please have a look

smoya · 2025-02-20T18:42:05Z

projects/pgai/pgai/vectorizer/embedders/openai.py

-        return BatchApiCaller(self._max_chunks_per_batch(), self.call_embed_api)
+        return BatchApiCaller(
+            self._max_chunks_per_batch(),
+            600_000, # See https://github.com/timescale/pgai/pull/482#issuecomment-2659799567


WDYT about making this part of the Embedder abstract class, as it is the max_chunks_per_batch. A function like max_tokens_per_batch could work, so we can additionally add such a limit in other providers (if at some point we find them).

Sounds like a good idea, will do that. I can't make the function abstract though, since that would mean implementing it everywhere which at this point does not make sense.

Done, please check again.

smoya · 2025-02-20T19:08:11Z

projects/pgai/pgai/vectorizer/embeddings.py

    api_callable: Callable[[list[T]], Awaitable[EmbeddingResponse]]
+    encoder: tiktoken.Encoding | None


tiktoken is exclusively from OpenAI. What about creating an encoder interface instead? Other embedders could implement if necessary.
At the end, the signature could be just a function where the input is a string and returns an array of tokens. Something like Encoder = Callable[[str], list[int]] or Callable[[str, ...], list[int]] if we want to support kwargs.

In that way, you won't call self.encoder.encode_ordinary(chunk) later but self.encoder.encode(chunk) instead

Refactored it, please check again.

I'm unsure if that's the way to go, python isn't my strong suit 🙃

kolaente requested a review from a team as a code owner February 14, 2025 11:14

kolaente mentioned this pull request Feb 14, 2025

[Bug]: Trying to run the vectorizer on a large number of new documents results in "Requested 629204 tokens, max 600000 tokens per request" from openai #481

Open

kolaente force-pushed the fix/halve-documents-when-too-many-tokens branch from 1e83b35 to 6eabc74 Compare February 14, 2025 12:16

fix: max_tokens_per_request error when pushing many documents at once

08c7a1e

Resolves timescale#481

kolaente force-pushed the fix/halve-documents-when-too-many-tokens branch from 6eabc74 to 08c7a1e Compare February 14, 2025 12:19

feat: add openai max token per batch limit and set to 600k

30997ed

smoya reviewed Feb 20, 2025

View reviewed changes

kolaente added 2 commits February 21, 2025 13:12

chore: refactor max tokens limit to use a function instead

6a7a0f7

chore: refactor encoding to use interface instead

3c11811

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: max_tokens_per_request error when pushing many documents at once #482

fix: max_tokens_per_request error when pushing many documents at once #482

kolaente commented Feb 14, 2025

kolaente commented Feb 14, 2025

cevian commented Feb 14, 2025

kolaente commented Feb 14, 2025

cevian commented Feb 14, 2025

kolaente commented Feb 20, 2025

smoya Feb 20, 2025

kolaente Feb 21, 2025

kolaente Feb 21, 2025

smoya Feb 20, 2025

kolaente Feb 21, 2025

		api_callable: Callable[[list[T]], Awaitable[EmbeddingResponse]]
		encoder: tiktoken.Encoding \| None

fix: max_tokens_per_request error when pushing many documents at once #482

Are you sure you want to change the base?

fix: max_tokens_per_request error when pushing many documents at once #482

Conversation

kolaente commented Feb 14, 2025

kolaente commented Feb 14, 2025

cevian commented Feb 14, 2025

kolaente commented Feb 14, 2025

cevian commented Feb 14, 2025

kolaente commented Feb 20, 2025

smoya Feb 20, 2025

Choose a reason for hiding this comment

kolaente Feb 21, 2025

Choose a reason for hiding this comment

kolaente Feb 21, 2025

Choose a reason for hiding this comment

smoya Feb 20, 2025

Choose a reason for hiding this comment

kolaente Feb 21, 2025

Choose a reason for hiding this comment