Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deadlock when using GoogleAzureOCR with parallel threads #17

Closed
phschoepf opened this issue Apr 22, 2024 · 3 comments
Closed

Deadlock when using GoogleAzureOCR with parallel threads #17

phschoepf opened this issue Apr 22, 2024 · 3 comments
Labels
0.x bug Something isn't working

Comments

@phschoepf
Copy link
Contributor

phschoepf commented Apr 22, 2024

When using GoogleAzureOCR and calling ocr from multiple concurrent threads (e.g. via asyncio.to_thread), the program will sometimes hang on the creation of a new GoogleOCR instance.

Setting breakpoints before creating the GoogleOCR instance resolves the issue - it's a heisenbug.

  • Python 3.9.19
  • ocr_wrapper 0.0.24

MWE

(the ocr function is the relevant part of GoogleAzureOCR.ocr)

import asyncio
import time

from ocr_wrapper import GoogleOCR, AzureOCR

# Requirements to reproduce the issue:
# - Async function running in asyncio.run()
# - In that function, asyncio.gather() awaitables under a semaphore (>1)
# - The awaitables are synchronous functions wrapped in asyncio.to_thread()


async def preprocess():
    semaphore = asyncio.Semaphore(4)

    async def _sem_wrapper():
        async with semaphore:
            return await asyncio.to_thread(ocr)

    awaitables = [_sem_wrapper() for _ in range(2)]
    page_texts = await asyncio.gather(*awaitables)
    return page_texts


def ocr():
    google_ocr = GoogleOCR(
        auto_rotate=True,
        correct_tilt=False,
        ocr_samples=1,
        max_size=4096,
        verbose=False,
    )
    azure_ocr = AzureOCR(
        auto_rotate=False,
        correct_tilt=False,
        ocr_samples=1,
        max_size=4096,
        verbose=False,
    )
    return (google_ocr, azure_ocr)


if __name__ == "__main__":
    start = time.perf_counter()
    res = asyncio.run(preprocess())

    print(res)
    print(f"Time taken: {time.perf_counter() - start:.2f}s")

Expected behavior

Runs and exits without error in <<1s

Actual behavior

Hangs forever, process cannot be killed with Ctr+C

@phschoepf phschoepf added 0.x bug Something isn't working labels Apr 22, 2024
@phschoepf
Copy link
Contributor Author

Solved by upgrading grpcio (dependency of google-cloud-vision) manually to 1.62.2.

Did a quick grid search, it seems that all versions grpcio>=1.59,<=1.62.1 have the bug.

@phschoepf
Copy link
Contributor Author

Relevant gRPC issue: grpc/grpc#36376

@phschoepf
Copy link
Contributor Author

fixed in #18

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0.x bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant