You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The code comments mention that ProcessPoolExecutor is used in favour of ThreadPoolExecutor and mentioning the python GIL as one of the reasons. I would like to argue that it ThreadPoolExecutor is perfectly fine in this use case.
First of all, the GIL is only a problem for threads when threads execute python code. The GIL only allows one thread to use the python interpreter, which renders the other threads useless. However for I/O tasks, python releases the GIL, asks the OS to handle the I/O request, and suspends the active python thread so that other threads can continue.
The grobid client is simply a wrapper that sends a batch of post requests. No heavy calculations are done on the python since and hence using ThreadPoolExecutor is perfectly fine, has much less overhead and is much less troublesome across different OS'es. Would it be possible to make the ThreadPoolExecutor default?
The text was updated successfully, but these errors were encountered:
Thank you for the issue, you're absolutely right. I am actually using ThreadPoolExecutor in my more recent python clients for I/O intensive tasks. I think at the time I wrote this client (4 years ago), I was a bit confused by this aspect and I didn't come back to it afterwards.
I push an update replacing ProcessPoolExecutor - see e7710c2
The code comments mention that ProcessPoolExecutor is used in favour of ThreadPoolExecutor and mentioning the python GIL as one of the reasons. I would like to argue that it ThreadPoolExecutor is perfectly fine in this use case.
First of all, the GIL is only a problem for threads when threads execute python code. The GIL only allows one thread to use the python interpreter, which renders the other threads useless. However for I/O tasks, python releases the GIL, asks the OS to handle the I/O request, and suspends the active python thread so that other threads can continue.
The grobid client is simply a wrapper that sends a batch of post requests. No heavy calculations are done on the python since and hence using ThreadPoolExecutor is perfectly fine, has much less overhead and is much less troublesome across different OS'es. Would it be possible to make the ThreadPoolExecutor default?
The text was updated successfully, but these errors were encountered: