You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks again for these incredible tools. I've been testing out the Python client - and encountered an issue when passing a PDF as an argument both while using the CLI and Python. I didn't receive any output.
On GCP, I was trying to pass files directly in Grobid without downloading them - which I would have to do with the current setup. Anyway to stream PDFs in Grobid ? Or to send them as file objects ? If not, I'll try to see if I can pull something off quickly and test it.
The text was updated successfully, but these errors were encountered:
This client takes indeed a directory as input/output, as documented, because this is directed to batch processing of many files.
For me this client is a basis that can be adapted to different usage scenario, so I tried to keep it simple, with zero external dependencies. You can use the client as a package and then call process_batch() or process_pdf() as it is convenient on set of files and pipeline.
You can probably start sending files while downloading to the Grobid server, but Grobid will only start processing a file when it is entirely uploaded (for stability/robustness and technical reasons). So the easiest for your scenario is probably to download a file, add it to an executor, and then delete the file when the result is ready.
From my experience, if no consolidation of citation is used, Grobid is faster to process a file than required to download a typical Unpaywall file.
Hey Grobid team,
Thanks again for these incredible tools. I've been testing out the Python client - and encountered an issue when passing a PDF as an argument both while using the CLI and Python. I didn't receive any output.
Sample code below
grobid_client --input ./resource/my.PDF --output ./out processFulltextDocument
I realized while debugging that L122 of the
grobid_client.py
file implies passing in a directory and not the file itself as in the below request.grobid_client --input ./resource/mypdfdir --output ./out processFulltextDocument
On GCP, I was trying to pass files directly in Grobid without downloading them - which I would have to do with the current setup. Anyway to stream PDFs in Grobid ? Or to send them as file objects ? If not, I'll try to see if I can pull something off quickly and test it.
The text was updated successfully, but these errors were encountered: