Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leaks (when early break from iterator) #393

Open
shekhovt opened this issue Dec 6, 2024 · 2 comments
Open

Memory leaks (when early break from iterator) #393

shekhovt opened this issue Dec 6, 2024 · 2 comments

Comments

@shekhovt
Copy link

shekhovt commented Dec 6, 2024

It is eating up host memory indefinitely, much more than the unpacked dataset would take.

MWE:

import torch
#------------------
import ffcv
from ffcv.loader import OrderOption
from ffcv.fields.decoders import IntDecoder
from ffcv.transforms import ToTensor, Squeeze, ToTorchImage
from ffcv.fields.rgb_image import CenterCropRGBImageDecoder
# -----------------

dev = torch.device('cpu')

train_set = '/local/temporary/data/imagenet/imagenet_pytorch/train-res=256-cls=10.beton'
input_size = 128

def train_pipelines():
    image_pipeline= [
            CenterCropRGBImageDecoder((input_size, input_size), ratio = 1),
            ToTensor(),
            ToTorchImage(),
        ]
    
    label_pipeline = [
            IntDecoder(),
            ToTensor(),
            Squeeze(),
        ]
    
    # Pipeline for each data field
    tr_pipelines = {
        'image': image_pipeline,
        'label': label_pipeline
    }
    return tr_pipelines 

num_workers = 4
batch_size = 128
train_loader = ffcv.loader.Loader(train_set, batch_size=batch_size, num_workers=num_workers, order=OrderOption.RANDOM, pipelines=train_pipelines(), drop_last=True)

epochs = 1000

print('single loader')
for e in range(epochs):
    x = 0
    for data, target in train_loader:
        x += data.to(dev).sum().cpu().detach().item()
        break
    print(e, x)
image

The problem is apparently with the break out of the loader loop. For some intermediate tasks we need just a few batches. There must be a way to cleanup?

@shekhovt shekhovt changed the title Leaking host memory FFCV leaks host memory Dec 6, 2024
@shekhovt
Copy link
Author

shekhovt commented Dec 6, 2024

I found a workaround to replace the loop part with this:

    it = iter(train_loader)
    for data, target in it:
        x += data.to(dev).sum().cpu().detach().item()
        break
    it.close()

I am not expert on python, but it appears that

    def __del__(self):
        self.close()

def __del__(self):

is not a safe way to free resources.

@shekhovt
Copy link
Author

shekhovt commented Dec 6, 2024

If the image pipeline includes GPU processing, it is eating up GPU memory as well,

image

Some augmented tensors (already used or prepared ahead) are not disposed properly. This is in the context of MWE. The workaround stops that.

@shekhovt shekhovt changed the title FFCV leaks host memory Memory leaks (when early break from iterator) Dec 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant