Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Memory leak occurring when using lmdb.open #354

Open
tczhangzhi opened this issue Dec 24, 2023 · 1 comment
Open

[Bug] Memory leak occurring when using lmdb.open #354

tczhangzhi opened this issue Dec 24, 2023 · 1 comment

Comments

@tczhangzhi
Copy link

tczhangzhi commented Dec 24, 2023

Affected Operating Systems

Linux

Affected py-lmdb Version

1.4.1

py-lmdb Installation Method

pip install lmdb

Using bundled or distribution-provided LMDB library?

Bundled

Distribution name and LMDB library version

(0, 9, 29)

Machine "free -m" output

              total        used        free      shared  buff/cache   available
Mem:         385609       55278       22059        5864      308271      322580
Swap:             0           0           0

Other important machine info

Not related.

Describe Your Problem

There's a memory leak occurring when using lmdb.open. The memory usage keeps increasing until all the programs have finished running. Here is a reproduction case:

import numpy as np
import lmdb
import os
import psutil
import shutil

def load_array():
    """ Store and load a numpy array using LMDB. """
    # with lmdb.open('lmdb_data', map_size=1000000000) as env:

    #     with env.begin(write=False) as txn:
    #         # Retrieve the bytes and convert back to numpy array
    #         data_bytes = txn.get(b'array')
    #         retrieved_data = np.frombuffer(data_bytes, dtype=np.float64).reshape(100, 100)
    env = lmdb.open('lmdb_data', map_size=1000000000) # 1GB size

    with env.begin(write=False) as txn:
        # Retrieve the bytes and convert back to numpy array
        data_bytes = txn.get(b'array')
        retrieved_data = np.frombuffer(data_bytes, dtype=np.float64).reshape(100, 100)

    env.close()
    # del env
    return retrieved_data

def store_array():
    # Generate a random numpy array
    array_size = (100, 100) # Example size
    array = np.random.rand(*array_size)

    # Set up the LMDB environment
    env = lmdb.open('lmdb_data', map_size=1000000000) # 1GB size

    with env.begin(write=True) as txn:
        # Convert the numpy array to bytes and store it
        txn.put(b'array', array.tobytes())

print(lmdb.__version__)
print(lmdb.version())

store_array()

# Monitor memory usage
process = psutil.Process(os.getpid())
initial_memory = process.memory_info().rss

# Repeat the process 1000 times
for _ in range(1000):
    load_array()
    current_memory = process.memory_info().rss
    print(current_memory)
    if current_memory > initial_memory * 1.01: # 10% increase threshold
        print("Potential memory leak detected.")
        break

# # Cleanup
# if os.path.exists('lmdb_data'):
#     shutil.rmtree('lmdb_data')

# print("Task completed without detecting memory leak.")

Errors/exceptions Encountered

Potential memory leak detected.

Describe What You Expected To Happen

Task completed without detecting memory leak.

Describe What Happened Instead

Potential memory leak detected.

@mikkelfo
Copy link

mikkelfo commented Feb 5, 2025

@jnwatson Any updates related to this issue? I'm seeing the same behaviour in 1.5.1, this is my repo:

import lmdb

default_val = list(range(500))
with lmdb.open("lmdb_data", map_size=2_000_000_000) as env:
    with env.begin(write=True) as txn:
        for i inrange(500_000):
            txn.put(str(i).encode(), str(default_val).encode())

Followed by a

import psutil
process = psutil.Process()
with lmdb.open("lmdb_data") as env:
    with env.begin() as txn:
        for i in range(500_000):
            value = txn.get(str(i).encode())
            if i % 100_000 == 0:
                print(process.memory_info().rss / 1024 / 1024)
print(process.memory_info().rss / 1024 / 1024) # Back to normal

The initial memory usage starts at ~200MB and ends at ~2GB, so a full order of magnitude bigger. The memory usage only drops back to normal when the env is closed again. No forced garbage collection or del value clears up the memory

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants