Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory Leak in docTR API Integration with FastAPI #1889

Open
volkanncicek opened this issue Mar 6, 2025 · 10 comments
Open

Memory Leak in docTR API Integration with FastAPI #1889

volkanncicek opened this issue Mar 6, 2025 · 10 comments
Labels
awaiting response Waiting for feedback ext: api Related to api folder topic: docker Docker-related type: bug Something isn't working

Comments

@volkanncicek
Copy link

volkanncicek commented Mar 6, 2025

Bug description

I have integrated docTR into an API using the FastAPI framework to test its performance under heavy load. The API is deployed locally and is designed to handle OCR requests. However, I observed a significant memory issue during testing. When sending 100 consecutive OCR requests, the memory usage spikes and remains high even after all requests have been processed. I expected the memory usage to decrease back to its initial state once the processing was complete, but it did not. This behavior suggests a potential memory leak, which could lead to the application crashing due to excessive memory consumption.

Furthermore, I noticed that the docTR library's own API, as provided in their official documentation, exhibits a similar memory leak issue. Running their API integration template under the same conditions also results in elevated memory usage that does not return to the baseline after completing the requests.

Below is a graph illustrating the high memory consumption using the official docTR Dockerfile. This graph demonstrates how the memory usage increases during the requests and fails to decrease afterward, indicating a possible memory leak.

Image

Code snippet to reproduce the bug

import requests

params = {"det_arch": "db_resnet50", "reco_arch": "crnn_vgg16_bn"}
files = [("files", ("doc.jpg", open('/path/to/your/doc.jpg', 'rb').read(), "image/jpeg"))]

for _ in range(100):
    response = requests.post("http://localhost:8002/ocr", params=params, files=files)
    print(response.json())

Error traceback

There is no specific error traceback as the issue is related to memory usage rather than an explicit error. However, the application eventually crashes when the system runs out of memory.

Environment

The environment was set up using the official docTR Dockerfile.

root@f229a0a64f16:/app# python collect_env.py
Collecting environment information...

DocTR version: 0.11.1a0
TensorFlow version: N/A
PyTorch version: 2.6.0+cu124 (torchvision 0.21.0+cu124)
OpenCV version: 4.11.0
OS: Debian GNU/Linux 12 (bookworm)
Python version: 3.10.16
Is CUDA available (TensorFlow): N/A
Is CUDA available (PyTorch): No
CUDA runtime version: Could not collect
GPU models and configuration: Could not collect
Nvidia driver version: Could not collect
cuDNN version: Could not collect

Deep Learning backend

The backend setup is based on the official docTR Dockerfile configuration.

>>> from doctr.file_utils import is_tf_available, is_torch_available
>>> print(f"is_tf_available: {is_tf_available()}")
is_tf_available: False
>>> print(f"is_torch_available: {is_torch_available()}")
is_torch_available: True
@volkanncicek volkanncicek added the type: bug Something isn't working label Mar 6, 2025
@felixdittrich92
Copy link
Contributor

Hi @volkanncicek 👋,

Thanks for reporting I will have a look 👍

Btw.

files = [("files", ("doc.jpg", open('/path/to/your/doc.jpg', 'rb').read(), "image/jpeg"))]

You should put the open in a contextmanager with open(..) as f: <-- this could be already the reason for the leak if you process lots of files.

Additional our provided API code is only a reference it's nothing you should use in a production system as is !

Especially the ocr_predictor initialization should be in a lifespan and initialized once by keeping it into RAM instead of beeing dynamic to the request route (like in our template).

@felixdittrich92
Copy link
Contributor

felixdittrich92 commented Mar 7, 2025

I wasn't able to see any leak here:

🚀 Starting API Stress Test...

Sending Requests: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 500/500 [33:24<00:00,  4.01s/it]

🕒 Waiting 10 seconds for memory stabilization...


📊 [ Docker Container Memory Usage ]
🔹 Max Memory Used: 2040.51 MB
🔹 Avg Memory Used: 1849.99 MB
🔹 Initial Memory: 1857.31 MB
🔹 Final Memory After 10s: 1906.29 MB

⏱️ Average Response Time: 2.347 sec
🚀 Fastest Response Time: 2.217 sec
🐢 Slowest Response Time: 2.504 sec

Image

to reproduce:

cd doctr/api
make run

then run:

import docker
import requests
import time
import matplotlib.pyplot as plt
from tqdm import tqdm

API_URL = "http://localhost:8080/ocr"

headers = {"accept": "application/json"}
params = {"det_arch": "db_resnet50", "reco_arch": "crnn_vgg16_bn"}

with open('/home/felix/Desktop/20250301_123152657.jpg', 'rb') as f:
    file_content = f.read()

files = [("files", ("20250301_123152657.jpg", file_content, "image/jpeg"))]

def get_docker_memory(container_name="api_web"):
    """Fetch live memory usage of a Docker container."""
    client = docker.DockerClient(base_url='unix://var/run/docker.sock')
    container = client.containers.get(container_name)
    stats = container.stats(stream=False)

    mem_usage = stats["memory_stats"]["usage"] / (1024 * 1024)  # Convert to MB
    return round(mem_usage, 2)

def send_requests(n_requests=50, container_name="api_web"):
    """Send multiple requests and monitor Docker container memory over time."""
    session = requests.Session()

    response_times = []
    memory_usage = []
    timestamps = []

    initial_mem_usage = get_docker_memory(container_name)
    memory_usage.append(initial_mem_usage)
    timestamps.append(time.time())

    print("🚀 Starting API Stress Test...\n")

    for _ in tqdm(range(n_requests), desc="Sending Requests"):
        mem_usage = get_docker_memory(container_name)
        memory_usage.append(mem_usage)
        timestamps.append(time.time())

        start_time = time.time()
        response = session.post(API_URL, headers=headers, params=params, files=files)
        response_times.append(time.time() - start_time)

        if response.status_code != 200:
            print(f"Error: {response.status_code}, Response: {response.text}")

    print("\n🕒 Waiting 10 seconds for memory stabilization...\n")
    time.sleep(10)

    final_mem_usage = get_docker_memory(container_name)
    memory_usage.append(final_mem_usage)
    timestamps.append(time.time())

    print("\n📊 [ Docker Container Memory Usage ]")
    print(f"🔹 Max Memory Used: {max(memory_usage):.2f} MB")
    print(f"🔹 Avg Memory Used: {sum(memory_usage) / len(memory_usage):.2f} MB")
    print(f"🔹 Initial Memory: {initial_mem_usage:.2f} MB")
    print(f"🔹 Final Memory After 10s: {final_mem_usage:.2f} MB")

    print(f"\n⏱️ Average Response Time: {sum(response_times) / len(response_times):.3f} sec")
    print(f"🚀 Fastest Response Time: {min(response_times):.3f} sec")
    print(f"🐢 Slowest Response Time: {max(response_times):.3f} sec")

    plt.figure(figsize=(10, 5))
    plt.plot(timestamps, memory_usage, marker='o', linestyle='-', color='b', label="Memory Usage (MB)")

    plt.scatter([timestamps[0], timestamps[-1]], [memory_usage[0], memory_usage[-1]],
                color='red', s=100, label="Start & End Points")

    plt.xlabel("Time (s)")
    plt.ylabel("Memory Usage (MB)")
    plt.title(f"Memory Usage Over Time - {container_name}")
    plt.legend()
    plt.grid(True)
    plt.show()

send_requests(n_requests=500, container_name="api_web")

Besides I tracked the docker stats live:

docker ps
docker stats $(docker ps -q --filter "publish=8080")

@felixdittrich92 felixdittrich92 added topic: docker Docker-related ext: api Related to api folder awaiting response Waiting for feedback labels Mar 7, 2025
@felixdittrich92
Copy link
Contributor

btw. for prod scenarios I would suggest to use: https://github.com/felixdittrich92/OnnxTR
Its more hardware optimized, requires less ressources and is especially on CPU much faster

@volkanncicek
Copy link
Author

volkanncicek commented Mar 7, 2025

Thanks, @felixdittrich92 ! I really appreciate the quick response.

I'll take a look at ONNX.

The initial memory usage starts at 381 MB, but it keeps increasing as requests are processed. Ideally, the memory should be released at some point, but it isn't. In a Kubernetes environment, this leads to the container hitting the memory limit and restarting. I don't have infinite memory. Did you observe a similar behavior on your side?

The service starts with low memory usage, as expected.
Image

After 500 requests, the memory usage increases but does not return to its initial state. This suggests that allocated memory is not being released properly.
Image

After waiting, sent another 500 requests, and memory usage keeps increasing, showing that the application is accumulating memory instead of freeing it.
Image

Additionally, in our service, we load the model into RAM only once and use a context manager when opening images.

Test Environment:
I'm running Docker on Windows, which might be a factor.

@felixdittrich92
Copy link
Contributor

felixdittrich92 commented Mar 7, 2025

Hey @volkanncicek :)

Yeah I see and no while profiling there was nothing similar (max run was 1000 requests) ..and yes I run on Linux (wouldn't be the first time that something is strange on Windows 😅 )

Yeah give it a try with OnnxTR would be happy to get a feedback 👍

@volkanncicek
Copy link
Author

Hey @felixdittrich92,

Thanks for your insights! Out of curiosity, which OS and version are you running your tests on? Since I’m using Docker on Windows, I wonder if the difference in behavior might be OS-related.

Looking forward to your thoughts! 👍

@felixdittrich92
Copy link
Contributor

Hey @felixdittrich92,

Thanks for your insights! Out of curiosity, which OS and version are you running your tests on? Since I’m using Docker on Windows, I wonder if the difference in behavior might be OS-related.

Looking forward to your thoughts! 👍

Distributor ID:	Ubuntu
Description:	Ubuntu 24.04.2 LTS
Release:	24.04
Codename:	noble

:)

@felixdittrich92
Copy link
Contributor

@volkanncicek Have you had the chance to test if the same happens with OnnxTR on your machine ?

@volkanncicek
Copy link
Author

Yes @felixdittrich92, I noticed the same behavior with OnnxTR, as memory usage increases as the number of requests grows. I suspect that if you’re using a GPU, inference is handled there, reducing the load on main memory. However, in our case, since we’re processing OCR on the CPU, memory usage keeps increasing indefinitely and never stops. It seems there is a memory leak somewhere because the memory isn’t being released properly.

@felixdittrich92
Copy link
Contributor

Yes @felixdittrich92, I noticed the same behavior with OnnxTR, as memory usage increases as the number of requests grows. I suspect that if you’re using a GPU, inference is handled there, reducing the load on main memory. However, in our case, since we’re processing OCR on the CPU, memory usage keeps increasing indefinitely and never stops. It seems there is a memory leak somewhere because the memory isn’t being released properly.

Mh the profiling was also by using only the CPU (i7-14770K in my case) ... any chance that you profile the code with memray and send me the created .bin file that I can generate the flamegraph ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting response Waiting for feedback ext: api Related to api folder topic: docker Docker-related type: bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants