-
Notifications
You must be signed in to change notification settings - Fork 16.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MathpixPDFLoader never finishes #11249
Comments
🤖 Hello, Thank you for reaching out and providing detailed information about your issue. Based on the context provided, it seems like there was a similar issue reported in the past where the MathpixPDFLoader was not working as expected. The issue was due to a typo in the However, since you're using LangChain version 0.0.305, this should not be the problem. Another possible reason could be a timeout error. The If none of the above solutions work, could you please provide more information about the document you're trying to process? For example, the size of the document, the complexity of the content, etc. This information could help us better understand the issue and find a solution. Best, SourcesThis response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot. |
I am parsing this pdf. The timeout is not an issue, the detault is 500 and the result is visible in backend of mathpix
|
Something changed between v0.0.300 and v0.0.305 in pdf.py that is causing a TimeoutError:
I messaged Mathpix support and they indicated that nothing changed on their end:
This is not due to jreinjr's fix as their solution was working in v0.0.300 and must be somewhere else. |
Okay, I had to do some testing to determine which version change caused the I thought it might have something to do with langsmith, since at some point upgrading langchain forces an upgrade to langsmith, but it still runs okay at langchain v0.0.288 and langsmith v0.0.43, so apparently has nothing to do with langsmith. Using jreinjr's fix, the last version this was working was actually 0.0.288. Something changed between 0.0.288 and 0.0.290:
Please look at what changed between 0.0.288 to 0.0.289 that might have caused this to timeout. Thanks |
My observation is that the request is not really timing out; as you pointed out, the PDF does get processed. But in this section of the PDF loader code, we look for So I think there are two issues here:
I'm working on a fix for this, but I just wanted to drop my notes here first. |
I've got a fix working at #13949. |
- **Description:** Fixes the Mathpix PDF loader API integration. Specifically, ensures that Mathpix auth headers are provided for every request, and ensures that we recognize all errors that can occur during a request. Also, the option to provide API keys as kwargs never actually worked before, but now that's fixed too. - **Issue:** #11249 - **Dependencies:** None
- **Description:** Fixes the Mathpix PDF loader API integration. Specifically, ensures that Mathpix auth headers are provided for every request, and ensures that we recognize all errors that can occur during a request. Also, the option to provide API keys as kwargs never actually worked before, but now that's fixed too. - **Issue:** langchain-ai#11249 - **Dependencies:** None
- **Description:** Fixes the Mathpix PDF loader API integration. Specifically, ensures that Mathpix auth headers are provided for every request, and ensures that we recognize all errors that can occur during a request. Also, the option to provide API keys as kwargs never actually worked before, but now that's fixed too. - **Issue:** langchain-ai#11249 - **Dependencies:** None
System Info
langchain==0.0.305
MacOS M1 Silicon
Who can help?
No response
Information
Related Components
Reproduction
I have a basic methon for loading the pdf via mathpix
but if I run this, I gen only this:
Status: None, waiting for processing to complete
Status: None, waiting for processing to complete
Status: None, waiting for processing to complete
Status: None, waiting for processing to complete
Status: None, waiting for processing to complete
Status: None, waiting for processing to complete
Status: None, waiting for processing to complete
...
in backend of mathpix I see the document aldready parsed.
Any idea on this?
Expected behavior
Expect the document to return.
The text was updated successfully, but these errors were encountered: