Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Grobid Server Configuration - memory #530

Closed
Little-Student opened this issue Jan 3, 2020 · 2 comments
Closed

Grobid Server Configuration - memory #530

Little-Student opened this issue Jan 3, 2020 · 2 comments

Comments

@Little-Student
Copy link

Little-Student commented Jan 3, 2020

Hi Grobid Team,

Thanks for the great technical document parsing tools.

I think this isn't an issue rather than a configuration question. Is there any recommendation about the Graboid server configuration in production.

The case that we find out is that when the Grobid server(0.5.6) coming out of box used too much mamory, my Linux server has 128G memory, The Grobid server runs using 80G memory.

grobid-0.5.6\grobid-0.5.6\grobid-home\config\grobid.properties: grobid.3rdparty.pdf2xml.memory.limit.mb=6096

Can I control the memory by modifying the configuration?

Thank you!

@kermitt2
Copy link
Owner

kermitt2 commented Jan 6, 2020

Hello @Little-Student !

You can have a look at the production config description here: #443 (comment)

For managing parallel requests/multithreading, you can use one of the following clients:
https://grobid.readthedocs.io/en/latest/Grobid-service/#clients-for-grobid-web-services

For processing scientific papers, I've never seen GROBID taking really more that ~20GB, and it was with a 16 core machine, using 24 parallel requests (org.grobid.max.connections=24 in grobid.properties) and 32GB in total on the machine (far from being entirely used). In general we are lower than that.

I don't know what kind of documents you are processing to arrive to 80GB of memory, but using that amount of memory is not a normal behaviour for a GROBID server and usual scientific papers (in average 20 pages). Even for documents of several hundred pages processed in parallel Grobid should not use so much memory.

Normally the grobid.3rdparty.pdf2xml.memory.limit.mb does not need to be changed, it's a safeguard for exceptional cases where the input PDF has some unusual embedded objects or corruptions.

@Little-Student
Copy link
Author

Thank you very much, I understand. I looked again and found that it is the shared, not used. Thank you for your guidance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants