-
Notifications
You must be signed in to change notification settings - Fork 187
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Midsize PDF file yields Error 400: "The request's total referenced files bytes are too large to be read" #308
Comments
Hi, @rnckp I succeeded with a PDF of size around 20MB and it worked for me. Think it might be related to input token limitation? Can you try to get the token count of your prompt? I used below script to get the token count: response = client.models.count_tokens( And which model are you using in your test? Each model has different limits on input_token_limit, which you can get by calling client.models.list() |
Hi @jamg7 Thanks for your help. I was using In order to count tokens I tried your exact code. This yields the error below. How can I fix this?
I tried this example code or yours to debug. This works fine:
Output
Btw: What I find confusing in your example code is that one time the sample file is given as an argument in the |
Hi, @rnckp I think the problem might be caused by the type of the sample_file object. I was using the return value from the files.upload() API, as shown in below code: sample_file = client.files.upload(file=doc_data, config={"mime_type":'application/pdf'}) |
@jamg7 I did exactly the same, to no avail. It does not work and yields the error. What can I do to fix this? Again to avoid misunderstandings - everything works fine if I use the same PDF cut down to 10 pages. Not only can I process the smaller PDF but I can also successfully count the tokens. It does not work with the larger PDF. Then I get mentioned error. Maybe to give you more details about the full PDF that does not work - this is the printout (slightly redacted) of the returned value after uploading the file:
PS: I corrected my previous comment in regard to the example code and output above. Now it is the correct code snippet and the proper output. |
Thanks for the quick response, @rnckp . I just double checked my test script. My previous test is for a PDF of size 20MB instead of 200MB. Sorry about that! I just produced a 160MB file and got a similar error message: google.genai.errors.ClientError: 400 INVALID_ARGUMENT. {'error': {'code': 400, 'message': 'Request contains an invalid argument.', 'status': 'INVALID_ARGUMENT', 'details': [{'@type': 'type.googleapis.com/google.rpc.DebugInfo', 'detail': '[ORIGINAL ERROR] generic::invalid_argument: Document size exceeds supported limit: 166877608 v.s 52428800'}]}} |
The error message mentioned that the maximum file size is 52428800, can you try a pdf file under that limit and see if it works? |
I can confirm that this works. I have created a PDF with 51231211 bytes. I now can count the tokens and OCR the file. This finding seems to contradict your documentation here where you state that files can be up to 2GB.
Is ~52MB really the size limit that I can use? How can I process PDFs (or other documents) that are larger than this? EDIT: I tried to submit a larger PDF in parts in one prompt, each part having a file size below the limit. This doesn't work neither. The OCR stops somewhere on the second PDF part. |
Thanks for confirming that PDF with size less than 52428800 works, @rnckp. The document you linked talks about storage limitation, which is different from the limitation of file size that each model can handle. I didn't find a good source of public document talking about PDF size limitation for Gemini API. So there might be a document gap here. Meanwhile, closing this ticket as the direct issue has been addressed. |
Hi @jamg7 Thanks. However, I politely do not concur with your assessment. The documentation is about file size too and says explicitely as cited above:
I'd like to ask again: Is ~52MB really the size limit that I can use? How can I process PDFs (or other documents) that are larger than this? |
Reopening - Thank you for reporting. The service team is investigating the root cause of the issue. |
The service currently only supports pdf file of size 50MB or less and 300 pages or less. |
@pamorgan Thanks very much, Peter. I appreciate this clarification. 👍 |
Environment details
Steps to reproduce
I have a midsize PDF file with 160MB and 127 pages. I can successfully upload the PDF with
client.files.upload(file=pdf_path)
.However, when I try to use the uploaded file in
client.models.generate_content()
I get this error:ClientError: 400 INVALID_ARGUMENT. {'error': {'code': 400, 'message': "The request's total referenced files bytes are too large to be read", 'status': 'INVALID_ARGUMENT'}}
My code works on the same PDF, when I shorten it to say 10 pages.
How can I fix this error and use the full PDF?
Thanks in advance for any help in this matter.
The text was updated successfully, but these errors were encountered: