Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Temporary files are never cleaned #2253

Closed
vincbon opened this issue Aug 25, 2022 · 5 comments
Closed

Temporary files are never cleaned #2253

vincbon opened this issue Aug 25, 2022 · 5 comments
Labels

Comments

@vincbon
Copy link

vincbon commented Aug 25, 2022

While doing cross-validation for hyperparameters tuning, I noticed the filesystem usage growing indefinitely.
During processing some directories are created in /tmp, but they are never cleaned.

Looking at the code I think the issue is in the fit method in models.py:

def fit(self, stan_init, stan_data, **kwargs):
        (stan_init, stan_data) = self.prepare_data(stan_init, stan_data)

        if 'inits' not in kwargs and 'init' in kwargs:
            kwargs['inits'] = self.prepare_data(kwargs['init'], stan_data)[0]

        args = dict(
            data=stan_data,
            inits=stan_init,
            algorithm='Newton' if stan_data['T'] < 100 else 'LBFGS',
            iter=int(1e4),
            output_dir = mkdtemp(),
        )
        args.update(kwargs)

        try:
            self.stan_fit = self.model.optimize(**args)

The mkdtemp function requires the user to manually delete the directory once it is no longer needed (see the doc) but I don't see where it is cleaned here.

The cmdstanpy doc also says the following:

:param output_dir: Name of the directory to which CmdStan output
            files are written. If unspecified, output files will be written
            to a temporary directory which is deleted upon session exit.

So I'm wondering if output_dir = mkdtemp() is really necessary here ?

Thank you!

@vincbon
Copy link
Author

vincbon commented Aug 25, 2022

I just stumbled upon #2088. This line was added to avoid concurrency issues so it is indeed needed.
Like in #2160 TemporaryDirectory should be preferred to handle the cleaning process automatically.

@WardBrian
Copy link
Collaborator

Since stan-dev/cmdstanpy#569 (e.g. cmdstanpy 1.0.2+) cmdstanpy itself should avoid this problem. #2088 was before this release, when all cmdstan runs were saved in one temp folder, which could lead to collisions.

@tcuongd
Copy link
Collaborator

tcuongd commented Sep 4, 2022

Just double checking:

  1. cmdstanpy >= 1.0.2 would automatically create a different temp folder for each invocation of fit(), so that takes care of collision issues (different processes writing to the same file).
  2. However, cleanup is another issue. mkdtemp(), which is used currently in prophet's code and cmdstanpy's code, doesn't handle cleanup automatically, and we prefer TemporaryDirectory() instead. @WardBrian what do you think about making that change in cmdstanpy? Otherwise we can just change it in the Prophet code.

@WardBrian
Copy link
Collaborator

Cmdstanpy cleans up its own files using atexit

@tcuongd
Copy link
Collaborator

tcuongd commented Sep 4, 2022

Cmdstanpy cleans up its own files using atexit

Yep nice, just tested and it's working as expected for CV. Will do a quick PR to bump the minimum version for cmdstanpy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants