Memory Leak in docTR API Integration with FastAPI #1889

volkanncicek · 2025-03-06T21:00:51Z

Bug description

I have integrated docTR into an API using the FastAPI framework to test its performance under heavy load. The API is deployed locally and is designed to handle OCR requests. However, I observed a significant memory issue during testing. When sending 100 consecutive OCR requests, the memory usage spikes and remains high even after all requests have been processed. I expected the memory usage to decrease back to its initial state once the processing was complete, but it did not. This behavior suggests a potential memory leak, which could lead to the application crashing due to excessive memory consumption.

Furthermore, I noticed that the docTR library's own API, as provided in their official documentation, exhibits a similar memory leak issue. Running their API integration template under the same conditions also results in elevated memory usage that does not return to the baseline after completing the requests.

Below is a graph illustrating the high memory consumption using the official docTR Dockerfile. This graph demonstrates how the memory usage increases during the requests and fails to decrease afterward, indicating a possible memory leak.

Code snippet to reproduce the bug

import requests

params = {"det_arch": "db_resnet50", "reco_arch": "crnn_vgg16_bn"}
files = [("files", ("doc.jpg", open('/path/to/your/doc.jpg', 'rb').read(), "image/jpeg"))]

for _ in range(100):
    response = requests.post("http://localhost:8002/ocr", params=params, files=files)
    print(response.json())

Error traceback

There is no specific error traceback as the issue is related to memory usage rather than an explicit error. However, the application eventually crashes when the system runs out of memory.

Environment

The environment was set up using the official docTR Dockerfile.

root@f229a0a64f16:/app# python collect_env.py
Collecting environment information...

DocTR version: 0.11.1a0
TensorFlow version: N/A
PyTorch version: 2.6.0+cu124 (torchvision 0.21.0+cu124)
OpenCV version: 4.11.0
OS: Debian GNU/Linux 12 (bookworm)
Python version: 3.10.16
Is CUDA available (TensorFlow): N/A
Is CUDA available (PyTorch): No
CUDA runtime version: Could not collect
GPU models and configuration: Could not collect
Nvidia driver version: Could not collect
cuDNN version: Could not collect

Deep Learning backend

The backend setup is based on the official docTR Dockerfile configuration.

>>> from doctr.file_utils import is_tf_available, is_torch_available
>>> print(f"is_tf_available: {is_tf_available()}")
is_tf_available: False
>>> print(f"is_torch_available: {is_torch_available()}")
is_torch_available: True

felixdittrich92 · 2025-03-07T06:07:29Z

Hi @volkanncicek 👋,

Thanks for reporting I will have a look 👍

Btw.

files = [("files", ("doc.jpg", open('/path/to/your/doc.jpg', 'rb').read(), "image/jpeg"))]

You should put the open in a contextmanager with open(..) as f: <-- this could be already the reason for the leak if you process lots of files.

Additional our provided API code is only a reference it's nothing you should use in a production system as is !

Especially the ocr_predictor initialization should be in a lifespan and initialized once by keeping it into RAM instead of beeing dynamic to the request route (like in our template).

felixdittrich92 · 2025-03-07T08:13:09Z

I wasn't able to see any leak here:

🚀 Starting API Stress Test...

Sending Requests: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 500/500 [33:24<00:00,  4.01s/it]

🕒 Waiting 10 seconds for memory stabilization...


📊 [ Docker Container Memory Usage ]
🔹 Max Memory Used: 2040.51 MB
🔹 Avg Memory Used: 1849.99 MB
🔹 Initial Memory: 1857.31 MB
🔹 Final Memory After 10s: 1906.29 MB

⏱️ Average Response Time: 2.347 sec
🚀 Fastest Response Time: 2.217 sec
🐢 Slowest Response Time: 2.504 sec

to reproduce:

cd doctr/api
make run

then run:

import docker
import requests
import time
import matplotlib.pyplot as plt
from tqdm import tqdm

API_URL = "http://localhost:8080/ocr"

headers = {"accept": "application/json"}
params = {"det_arch": "db_resnet50", "reco_arch": "crnn_vgg16_bn"}

with open('/home/felix/Desktop/20250301_123152657.jpg', 'rb') as f:
    file_content = f.read()

files = [("files", ("20250301_123152657.jpg", file_content, "image/jpeg"))]

def get_docker_memory(container_name="api_web"):
    """Fetch live memory usage of a Docker container."""
    client = docker.DockerClient(base_url='unix://var/run/docker.sock')
    container = client.containers.get(container_name)
    stats = container.stats(stream=False)

    mem_usage = stats["memory_stats"]["usage"] / (1024 * 1024)  # Convert to MB
    return round(mem_usage, 2)

def send_requests(n_requests=50, container_name="api_web"):
    """Send multiple requests and monitor Docker container memory over time."""
    session = requests.Session()

    response_times = []
    memory_usage = []
    timestamps = []

    initial_mem_usage = get_docker_memory(container_name)
    memory_usage.append(initial_mem_usage)
    timestamps.append(time.time())

    print("🚀 Starting API Stress Test...\n")

    for _ in tqdm(range(n_requests), desc="Sending Requests"):
        mem_usage = get_docker_memory(container_name)
        memory_usage.append(mem_usage)
        timestamps.append(time.time())

        start_time = time.time()
        response = session.post(API_URL, headers=headers, params=params, files=files)
        response_times.append(time.time() - start_time)

        if response.status_code != 200:
            print(f"Error: {response.status_code}, Response: {response.text}")

    print("\n🕒 Waiting 10 seconds for memory stabilization...\n")
    time.sleep(10)

    final_mem_usage = get_docker_memory(container_name)
    memory_usage.append(final_mem_usage)
    timestamps.append(time.time())

    print("\n📊 [ Docker Container Memory Usage ]")
    print(f"🔹 Max Memory Used: {max(memory_usage):.2f} MB")
    print(f"🔹 Avg Memory Used: {sum(memory_usage) / len(memory_usage):.2f} MB")
    print(f"🔹 Initial Memory: {initial_mem_usage:.2f} MB")
    print(f"🔹 Final Memory After 10s: {final_mem_usage:.2f} MB")

    print(f"\n⏱️ Average Response Time: {sum(response_times) / len(response_times):.3f} sec")
    print(f"🚀 Fastest Response Time: {min(response_times):.3f} sec")
    print(f"🐢 Slowest Response Time: {max(response_times):.3f} sec")

    plt.figure(figsize=(10, 5))
    plt.plot(timestamps, memory_usage, marker='o', linestyle='-', color='b', label="Memory Usage (MB)")

    plt.scatter([timestamps[0], timestamps[-1]], [memory_usage[0], memory_usage[-1]],
                color='red', s=100, label="Start & End Points")

    plt.xlabel("Time (s)")
    plt.ylabel("Memory Usage (MB)")
    plt.title(f"Memory Usage Over Time - {container_name}")
    plt.legend()
    plt.grid(True)
    plt.show()

send_requests(n_requests=500, container_name="api_web")

Besides I tracked the docker stats live:

docker ps
docker stats $(docker ps -q --filter "publish=8080")

felixdittrich92 · 2025-03-07T08:58:19Z

btw. for prod scenarios I would suggest to use: https://github.com/felixdittrich92/OnnxTR
Its more hardware optimized, requires less ressources and is especially on CPU much faster

volkanncicek · 2025-03-07T12:32:21Z

Thanks, @felixdittrich92 ! I really appreciate the quick response.

I'll take a look at ONNX.

The initial memory usage starts at 381 MB, but it keeps increasing as requests are processed. Ideally, the memory should be released at some point, but it isn't. In a Kubernetes environment, this leads to the container hitting the memory limit and restarting. I don't have infinite memory. Did you observe a similar behavior on your side?

The service starts with low memory usage, as expected.

After 500 requests, the memory usage increases but does not return to its initial state. This suggests that allocated memory is not being released properly.

After waiting, sent another 500 requests, and memory usage keeps increasing, showing that the application is accumulating memory instead of freeing it.

Additionally, in our service, we load the model into RAM only once and use a context manager when opening images.

Test Environment:
I'm running Docker on Windows, which might be a factor.

felixdittrich92 · 2025-03-07T13:19:23Z

Hey @volkanncicek :)

Yeah I see and no while profiling there was nothing similar (max run was 1000 requests) ..and yes I run on Linux (wouldn't be the first time that something is strange on Windows 😅 )

Yeah give it a try with OnnxTR would be happy to get a feedback 👍

volkanncicek · 2025-03-12T08:39:10Z

Hey @felixdittrich92,

Thanks for your insights! Out of curiosity, which OS and version are you running your tests on? Since I’m using Docker on Windows, I wonder if the difference in behavior might be OS-related.

Looking forward to your thoughts! 👍

felixdittrich92 · 2025-03-12T08:53:21Z

Hey @felixdittrich92,

Thanks for your insights! Out of curiosity, which OS and version are you running your tests on? Since I’m using Docker on Windows, I wonder if the difference in behavior might be OS-related.

Looking forward to your thoughts! 👍

Distributor ID:	Ubuntu
Description:	Ubuntu 24.04.2 LTS
Release:	24.04
Codename:	noble

:)

felixdittrich92 · 2025-03-12T21:12:30Z

@volkanncicek Have you had the chance to test if the same happens with OnnxTR on your machine ?

volkanncicek · 2025-03-13T02:59:30Z

Yes @felixdittrich92, I noticed the same behavior with OnnxTR, as memory usage increases as the number of requests grows. I suspect that if you’re using a GPU, inference is handled there, reducing the load on main memory. However, in our case, since we’re processing OCR on the CPU, memory usage keeps increasing indefinitely and never stops. It seems there is a memory leak somewhere because the memory isn’t being released properly.

felixdittrich92 · 2025-03-13T09:39:17Z

Yes @felixdittrich92, I noticed the same behavior with OnnxTR, as memory usage increases as the number of requests grows. I suspect that if you’re using a GPU, inference is handled there, reducing the load on main memory. However, in our case, since we’re processing OCR on the CPU, memory usage keeps increasing indefinitely and never stops. It seems there is a memory leak somewhere because the memory isn’t being released properly.

Mh the profiling was also by using only the CPU (i7-14770K in my case) ... any chance that you profile the code with memray and send me the created .bin file that I can generate the flamegraph ?

volkanncicek added the type: bug Something isn't working label Mar 6, 2025

felixdittrich92 added topic: docker Docker-related ext: api Related to api folder awaiting response Waiting for feedback labels Mar 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory Leak in docTR API Integration with FastAPI #1889

Memory Leak in docTR API Integration with FastAPI #1889

volkanncicek commented Mar 6, 2025 •

edited

Loading

felixdittrich92 commented Mar 7, 2025

felixdittrich92 commented Mar 7, 2025 •

edited

Loading

felixdittrich92 commented Mar 7, 2025

volkanncicek commented Mar 7, 2025 •

edited

Loading

felixdittrich92 commented Mar 7, 2025 •

edited

Loading

volkanncicek commented Mar 12, 2025

felixdittrich92 commented Mar 12, 2025

felixdittrich92 commented Mar 12, 2025

volkanncicek commented Mar 13, 2025

felixdittrich92 commented Mar 13, 2025

Memory Leak in docTR API Integration with FastAPI #1889

Memory Leak in docTR API Integration with FastAPI #1889

Comments

volkanncicek commented Mar 6, 2025 • edited Loading

Bug description

Code snippet to reproduce the bug

Error traceback

Environment

Deep Learning backend

felixdittrich92 commented Mar 7, 2025

felixdittrich92 commented Mar 7, 2025 • edited Loading

felixdittrich92 commented Mar 7, 2025

volkanncicek commented Mar 7, 2025 • edited Loading

felixdittrich92 commented Mar 7, 2025 • edited Loading

volkanncicek commented Mar 12, 2025

felixdittrich92 commented Mar 12, 2025

felixdittrich92 commented Mar 12, 2025

volkanncicek commented Mar 13, 2025

felixdittrich92 commented Mar 13, 2025

volkanncicek commented Mar 6, 2025 •

edited

Loading

felixdittrich92 commented Mar 7, 2025 •

edited

Loading

volkanncicek commented Mar 7, 2025 •

edited

Loading

felixdittrich92 commented Mar 7, 2025 •

edited

Loading