-
Notifications
You must be signed in to change notification settings - Fork 482
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory Leak in docTR API Integration with FastAPI #1889
Comments
Hi @volkanncicek 👋, Thanks for reporting I will have a look 👍 Btw.
You should put the Additional our provided API code is only a reference it's nothing you should use in a production system as is ! Especially the |
I wasn't able to see any leak here:
to reproduce:
then run: import docker
import requests
import time
import matplotlib.pyplot as plt
from tqdm import tqdm
API_URL = "http://localhost:8080/ocr"
headers = {"accept": "application/json"}
params = {"det_arch": "db_resnet50", "reco_arch": "crnn_vgg16_bn"}
with open('/home/felix/Desktop/20250301_123152657.jpg', 'rb') as f:
file_content = f.read()
files = [("files", ("20250301_123152657.jpg", file_content, "image/jpeg"))]
def get_docker_memory(container_name="api_web"):
"""Fetch live memory usage of a Docker container."""
client = docker.DockerClient(base_url='unix://var/run/docker.sock')
container = client.containers.get(container_name)
stats = container.stats(stream=False)
mem_usage = stats["memory_stats"]["usage"] / (1024 * 1024) # Convert to MB
return round(mem_usage, 2)
def send_requests(n_requests=50, container_name="api_web"):
"""Send multiple requests and monitor Docker container memory over time."""
session = requests.Session()
response_times = []
memory_usage = []
timestamps = []
initial_mem_usage = get_docker_memory(container_name)
memory_usage.append(initial_mem_usage)
timestamps.append(time.time())
print("🚀 Starting API Stress Test...\n")
for _ in tqdm(range(n_requests), desc="Sending Requests"):
mem_usage = get_docker_memory(container_name)
memory_usage.append(mem_usage)
timestamps.append(time.time())
start_time = time.time()
response = session.post(API_URL, headers=headers, params=params, files=files)
response_times.append(time.time() - start_time)
if response.status_code != 200:
print(f"Error: {response.status_code}, Response: {response.text}")
print("\n🕒 Waiting 10 seconds for memory stabilization...\n")
time.sleep(10)
final_mem_usage = get_docker_memory(container_name)
memory_usage.append(final_mem_usage)
timestamps.append(time.time())
print("\n📊 [ Docker Container Memory Usage ]")
print(f"🔹 Max Memory Used: {max(memory_usage):.2f} MB")
print(f"🔹 Avg Memory Used: {sum(memory_usage) / len(memory_usage):.2f} MB")
print(f"🔹 Initial Memory: {initial_mem_usage:.2f} MB")
print(f"🔹 Final Memory After 10s: {final_mem_usage:.2f} MB")
print(f"\n⏱️ Average Response Time: {sum(response_times) / len(response_times):.3f} sec")
print(f"🚀 Fastest Response Time: {min(response_times):.3f} sec")
print(f"🐢 Slowest Response Time: {max(response_times):.3f} sec")
plt.figure(figsize=(10, 5))
plt.plot(timestamps, memory_usage, marker='o', linestyle='-', color='b', label="Memory Usage (MB)")
plt.scatter([timestamps[0], timestamps[-1]], [memory_usage[0], memory_usage[-1]],
color='red', s=100, label="Start & End Points")
plt.xlabel("Time (s)")
plt.ylabel("Memory Usage (MB)")
plt.title(f"Memory Usage Over Time - {container_name}")
plt.legend()
plt.grid(True)
plt.show()
send_requests(n_requests=500, container_name="api_web") Besides I tracked the docker stats live:
|
btw. for prod scenarios I would suggest to use: https://github.com/felixdittrich92/OnnxTR |
Thanks, @felixdittrich92 ! I really appreciate the quick response. I'll take a look at ONNX. The initial memory usage starts at 381 MB, but it keeps increasing as requests are processed. Ideally, the memory should be released at some point, but it isn't. In a Kubernetes environment, this leads to the container hitting the memory limit and restarting. I don't have infinite memory. Did you observe a similar behavior on your side? The service starts with low memory usage, as expected. After 500 requests, the memory usage increases but does not return to its initial state. This suggests that allocated memory is not being released properly. After waiting, sent another 500 requests, and memory usage keeps increasing, showing that the application is accumulating memory instead of freeing it. Additionally, in our service, we load the model into RAM only once and use a context manager when opening images. Test Environment: |
Hey @volkanncicek :) Yeah I see and no while profiling there was nothing similar (max run was 1000 requests) ..and yes I run on Linux (wouldn't be the first time that something is strange on Windows 😅 ) Yeah give it a try with OnnxTR would be happy to get a feedback 👍 |
Hey @felixdittrich92, Thanks for your insights! Out of curiosity, which OS and version are you running your tests on? Since I’m using Docker on Windows, I wonder if the difference in behavior might be OS-related. Looking forward to your thoughts! 👍 |
:) |
@volkanncicek Have you had the chance to test if the same happens with OnnxTR on your machine ? |
Yes @felixdittrich92, I noticed the same behavior with OnnxTR, as memory usage increases as the number of requests grows. I suspect that if you’re using a GPU, inference is handled there, reducing the load on main memory. However, in our case, since we’re processing OCR on the CPU, memory usage keeps increasing indefinitely and never stops. It seems there is a memory leak somewhere because the memory isn’t being released properly. |
Mh the profiling was also by using only the CPU (i7-14770K in my case) ... any chance that you profile the code with memray and send me the created |
Bug description
I have integrated docTR into an API using the FastAPI framework to test its performance under heavy load. The API is deployed locally and is designed to handle OCR requests. However, I observed a significant memory issue during testing. When sending 100 consecutive OCR requests, the memory usage spikes and remains high even after all requests have been processed. I expected the memory usage to decrease back to its initial state once the processing was complete, but it did not. This behavior suggests a potential memory leak, which could lead to the application crashing due to excessive memory consumption.
Furthermore, I noticed that the docTR library's own API, as provided in their official documentation, exhibits a similar memory leak issue. Running their API integration template under the same conditions also results in elevated memory usage that does not return to the baseline after completing the requests.
Below is a graph illustrating the high memory consumption using the official docTR Dockerfile. This graph demonstrates how the memory usage increases during the requests and fails to decrease afterward, indicating a possible memory leak.
Code snippet to reproduce the bug
Error traceback
There is no specific error traceback as the issue is related to memory usage rather than an explicit error. However, the application eventually crashes when the system runs out of memory.
Environment
The environment was set up using the official docTR Dockerfile.
Deep Learning backend
The backend setup is based on the official docTR Dockerfile configuration.
The text was updated successfully, but these errors were encountered: