You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When using GoogleAzureOCR and calling ocr from multiple concurrent threads (e.g. via asyncio.to_thread), the program will sometimes hang on the creation of a new GoogleOCR instance.
Setting breakpoints before creating the GoogleOCR instance resolves the issue - it's a heisenbug.
Python 3.9.19
ocr_wrapper 0.0.24
MWE
(the ocr function is the relevant part of GoogleAzureOCR.ocr)
importasyncioimporttimefromocr_wrapperimportGoogleOCR, AzureOCR# Requirements to reproduce the issue:# - Async function running in asyncio.run()# - In that function, asyncio.gather() awaitables under a semaphore (>1)# - The awaitables are synchronous functions wrapped in asyncio.to_thread()asyncdefpreprocess():
semaphore=asyncio.Semaphore(4)
asyncdef_sem_wrapper():
asyncwithsemaphore:
returnawaitasyncio.to_thread(ocr)
awaitables= [_sem_wrapper() for_inrange(2)]
page_texts=awaitasyncio.gather(*awaitables)
returnpage_textsdefocr():
google_ocr=GoogleOCR(
auto_rotate=True,
correct_tilt=False,
ocr_samples=1,
max_size=4096,
verbose=False,
)
azure_ocr=AzureOCR(
auto_rotate=False,
correct_tilt=False,
ocr_samples=1,
max_size=4096,
verbose=False,
)
return (google_ocr, azure_ocr)
if__name__=="__main__":
start=time.perf_counter()
res=asyncio.run(preprocess())
print(res)
print(f"Time taken: {time.perf_counter() -start:.2f}s")
Expected behavior
Runs and exits without error in <<1s
Actual behavior
Hangs forever, process cannot be killed with Ctr+C
The text was updated successfully, but these errors were encountered:
When using GoogleAzureOCR and calling
ocr
from multiple concurrent threads (e.g. via asyncio.to_thread), the program will sometimes hang on the creation of a new GoogleOCR instance.Setting breakpoints before creating the GoogleOCR instance resolves the issue - it's a heisenbug.
MWE
(the
ocr
function is the relevant part ofGoogleAzureOCR.ocr
)Expected behavior
Runs and exits without error in <<1s
Actual behavior
Hangs forever, process cannot be killed with Ctr+C
The text was updated successfully, but these errors were encountered: