Midsize PDF file yields Error 400: "The request's total referenced files bytes are too large to be read" #308

rnckp · 2025-02-11T14:04:15Z

Environment details

Programming language: Python
OS: Mac OS 15.3.
Language runtime version: 3.10
Package version: v.1.1.0

Steps to reproduce

I have a midsize PDF file with 160MB and 127 pages. I can successfully upload the PDF with client.files.upload(file=pdf_path).

However, when I try to use the uploaded file in client.models.generate_content() I get this error:

ClientError: 400 INVALID_ARGUMENT. {'error': {'code': 400, 'message': "The request's total referenced files bytes are too large to be read", 'status': 'INVALID_ARGUMENT'}}

My code works on the same PDF, when I shorten it to say 10 pages.

How can I fix this error and use the full PDF?

Thanks in advance for any help in this matter.

The text was updated successfully, but these errors were encountered:

jamg7 · 2025-02-13T02:47:29Z

Hi, @rnckp

I succeeded with a PDF of size around 20MB and it worked for me. Think it might be related to input token limitation? Can you try to get the token count of your prompt?

I used below script to get the token count:

response = client.models.count_tokens(
model='gemini-2.0-flash-001',
contents=[
"summary the document",
document
],
)
print(response)

And which model are you using in your test? Each model has different limits on input_token_limit, which you can get by calling client.models.list()

rnckp · 2025-02-13T10:16:05Z

Hi @jamg7

Thanks for your help.

I was using gemini-2.0-flash.

In order to count tokens I tried your exact code. This yields the error below. How can I fix this?

---------------------------------------------------------------------------
ClientError                               Traceback (most recent call last)
Cell In[12], line 1
----> 1 response = client.models.count_tokens(
      2     model="gemini-2.0-flash-001",
      3     contents=["summary the document", sample_file],
      4 )
      5 print(response)

File ~/miniconda3/envs/google/lib/python3.10/site-packages/google/genai/models.py:4507, in Models.count_tokens(self, model, contents, config)
   4504 request_dict = _common.convert_to_dict(request_dict)
   4505 request_dict = _common.encode_unserializable_types(request_dict)
-> 4507 response_dict = self._api_client.request(
   4508     'post', path, request_dict, http_options
   4509 )
   4511 if self._api_client.vertexai:
   4512   response_dict = _CountTokensResponse_from_vertex(
   4513       self._api_client, response_dict
   4514   )

File ~/miniconda3/envs/google/lib/python3.10/site-packages/google/genai/_api_client.py:449, in ApiClient.request(self, http_method, path, request_dict, http_options)
    439 def request(
    440     self,
    441     http_method: str,
   (...)
    444     http_options: HttpOptionsOrDict = None,
    445 ):
    446   http_request = self._build_request(
    447       http_method, path, request_dict, http_options
    448   )
--> 449   response = self._request(http_request, stream=False)
    450   json_response = response.json
    451   if not json_response:

File ~/miniconda3/envs/google/lib/python3.10/site-packages/google/genai/_api_client.py:384, in ApiClient._request(self, http_request, stream)
    380   return HttpResponse(
    381       response.headers, response if stream else [response.text]
    382   )
    383 else:
--> 384   return self._request_unauthorized(http_request, stream)

File ~/miniconda3/envs/google/lib/python3.10/site-packages/google/genai/_api_client.py:407, in ApiClient._request_unauthorized(self, http_request, stream)
    398 http_session = requests.Session()
    399 response = http_session.request(
    400     method=http_request.method,
    401     url=http_request.url,
   (...)
    405     stream=stream,
    406 )
--> 407 errors.APIError.raise_for_response(response)
    408 return HttpResponse(
    409     response.headers, response if stream else [response.text]
    410 )

File ~/miniconda3/envs/google/lib/python3.10/site-packages/google/genai/errors.py:100, in APIError.raise_for_response(cls, response)
     98 status_code = response.status_code
     99 if 400 <= status_code < 500:
--> 100   raise ClientError(status_code, response)
    101 elif 500 <= status_code < 600:
    102   raise ServerError(status_code, response)

ClientError: 400 INVALID_ARGUMENT. {'error': {'code': 400, 'message': 'Request contains an invalid argument.', 'status': 'INVALID_ARGUMENT'}}

I tried this example code or yours to debug. This works fine:

response = client.models.count_tokens(
    model="gemini-2.0-flash-001",
    contents="why is the sky blue?",
)
print(response)

Output

total_tokens=7 cached_content_token_count=None

Btw: What I find confusing in your example code is that one time the sample file is given as an argument in the contents list, one time it is the file's name. Can either be used? However, neither is working in my case.

jamg7 · 2025-02-13T17:06:10Z

Hi, @rnckp

I think the problem might be caused by the type of the sample_file object. I was using the return value from the files.upload() API, as shown in below code:

sample_file = client.files.upload(file=doc_data, config={"mime_type":'application/pdf'})

rnckp · 2025-02-13T17:59:27Z

@jamg7 I did exactly the same, to no avail. It does not work and yields the error. What can I do to fix this?

Again to avoid misunderstandings - everything works fine if I use the same PDF cut down to 10 pages. Not only can I process the smaller PDF but I can also successfully count the tokens. It does not work with the larger PDF. Then I get mentioned error.

Maybe to give you more details about the full PDF that does not work - this is the printout (slightly redacted) of the returned value after uploading the file:

File(name='files/iul834XXXXXX', display_name=None, mime_type='application/pdf', size_bytes=161704743, create_time=datetime.datetime(2025, 2, 13, 10, 11, 21, 517380, tzinfo=TzInfo(UTC)), expiration_time=datetime.datetime(2025, 2, 15, 10, 11, 21, 458824, tzinfo=TzInfo(UTC)), update_time=datetime.datetime(2025, 2, 13, 10, 11, 21, 517380, tzinfo=TzInfo(UTC)), sha256_hash='NGZjMmM5NzI2MGIxNjdmMjNkZmEzNWVjNWY3NzA5ODAzN2U1YTI5ZTM4OTViZTc0ZWU3MGJhOGI1XXXXXXXXXX==', uri='https://generativelanguage.googleapis.com/v1beta/files/iul834XXXXXX', download_uri=None, state=<FileState.ACTIVE: 'ACTIVE'>, source=<FileSource.UPLOADED: 'UPLOADED'>, video_metadata=None, error=None)

PS: I corrected my previous comment in regard to the example code and output above. Now it is the correct code snippet and the proper output.

jamg7 · 2025-02-14T05:39:49Z

Thanks for the quick response, @rnckp . I just double checked my test script. My previous test is for a PDF of size 20MB instead of 200MB. Sorry about that!

I just produced a 160MB file and got a similar error message:

google.genai.errors.ClientError: 400 INVALID_ARGUMENT. {'error': {'code': 400, 'message': 'Request contains an invalid argument.', 'status': 'INVALID_ARGUMENT', 'details': [{'@type': 'type.googleapis.com/google.rpc.DebugInfo', 'detail': '[ORIGINAL ERROR] generic::invalid_argument: Document size exceeds supported limit: 166877608 v.s 52428800'}]}}

jamg7 · 2025-02-14T05:41:35Z

The error message mentioned that the maximum file size is 52428800, can you try a pdf file under that limit and see if it works?

rnckp · 2025-02-14T09:03:03Z

I can confirm that this works. I have created a PDF with 51231211 bytes. I now can count the tokens and OCR the file.

This finding seems to contradict your documentation here where you state that files can be up to 2GB.

You can use the File API to upload a document of any size. Always use the File API when the total request size (including the files, text prompt, system instructions, etc.) is larger than 20 MB.

Note: The File API lets you store up to 20 GB of files per project, with a per-file maximum size of 2 GB.

Is ~52MB really the size limit that I can use? How can I process PDFs (or other documents) that are larger than this?

EDIT: I tried to submit a larger PDF in parts in one prompt, each part having a file size below the limit. This doesn't work neither. The OCR stops somewhere on the second PDF part.

jamg7 · 2025-02-14T17:28:34Z

Thanks for confirming that PDF with size less than 52428800 works, @rnckp.

The document you linked talks about storage limitation, which is different from the limitation of file size that each model can handle.

I didn't find a good source of public document talking about PDF size limitation for Gemini API. So there might be a document gap here.

Meanwhile, closing this ticket as the direct issue has been addressed.

rnckp · 2025-02-14T21:31:25Z

Hi @jamg7

Thanks. However, I politely do not concur with your assessment. The documentation is about file size too and says explicitely as cited above:

Note: The File API lets you store up to 20 GB of files per project, with a per-file maximum size of 2 GB.

I'd like to ask again: Is ~52MB really the size limit that I can use? How can I process PDFs (or other documents) that are larger than this?

pamorgan · 2025-02-18T20:52:50Z

Reopening - Thank you for reporting. The service team is investigating the root cause of the issue.

pamorgan · 2025-02-20T19:38:30Z

The service currently only supports pdf file of size 50MB or less and 300 pages or less.
We will treat this issue as a missing documentation and will open an internal feature request to increase the supported pdf file size.
Thank you for raising this issue.

rnckp · 2025-02-20T20:04:12Z

@pamorgan Thanks very much, Peter. I appreciate this clarification. 👍

rnckp added priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. labels Feb 11, 2025

sasha-gitg added the api: gemini-api label Feb 11, 2025

sasha-gitg assigned jamg7 Feb 11, 2025

jamg7 closed this as completed Feb 14, 2025

rnckp mentioned this issue Feb 15, 2025

Documented vs. Actual File Size Limit for Large Files (PDFs) in Gemini Flash 2.0 #353

Closed

pamorgan reopened this Feb 18, 2025

pamorgan added the documentation Improvements or additions to documentation label Feb 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Midsize PDF file yields Error 400: "The request's total referenced files bytes are too large to be read" #308

Midsize PDF file yields Error 400: "The request's total referenced files bytes are too large to be read" #308

rnckp commented Feb 11, 2025

jamg7 commented Feb 13, 2025 •

edited

Loading

rnckp commented Feb 13, 2025 •

edited

Loading

jamg7 commented Feb 13, 2025

rnckp commented Feb 13, 2025 •

edited

Loading

jamg7 commented Feb 14, 2025 •

edited

Loading

jamg7 commented Feb 14, 2025

rnckp commented Feb 14, 2025 •

edited

Loading

jamg7 commented Feb 14, 2025

rnckp commented Feb 14, 2025

pamorgan commented Feb 18, 2025

pamorgan commented Feb 20, 2025

rnckp commented Feb 20, 2025

Midsize PDF file yields Error 400: "The request's total referenced files bytes are too large to be read" #308

Midsize PDF file yields Error 400: "The request's total referenced files bytes are too large to be read" #308

Comments

rnckp commented Feb 11, 2025

Environment details

Steps to reproduce

jamg7 commented Feb 13, 2025 • edited Loading

rnckp commented Feb 13, 2025 • edited Loading

jamg7 commented Feb 13, 2025

rnckp commented Feb 13, 2025 • edited Loading

jamg7 commented Feb 14, 2025 • edited Loading

jamg7 commented Feb 14, 2025

rnckp commented Feb 14, 2025 • edited Loading

jamg7 commented Feb 14, 2025

rnckp commented Feb 14, 2025

pamorgan commented Feb 18, 2025

pamorgan commented Feb 20, 2025

rnckp commented Feb 20, 2025

jamg7 commented Feb 13, 2025 •

edited

Loading

rnckp commented Feb 13, 2025 •

edited

Loading

rnckp commented Feb 13, 2025 •

edited

Loading

jamg7 commented Feb 14, 2025 •

edited

Loading

rnckp commented Feb 14, 2025 •

edited

Loading