Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mktemp/dir: delete temp files on exit by default #32851

Merged
merged 1 commit into from
Aug 14, 2019

Conversation

StefanKarpinski
Copy link
Member

@StefanKarpinski StefanKarpinski commented Aug 9, 2019

An alternative to #32335. Still needs tests and NEWS, which can maybe be taken from there. Sorry to snipe your PR, @DilumAluthge 😬

Instead of putting lots of atexit handlers in the list, this uses a single one and keeps a list of temporary paths to cleanup. If the list gets long (≥ 1024) it will be scanned for already-deleted paths when adding a new path to the cleanup list. To avoid running too frequently, it doubles the "too many paths" threshold after this.

Needs the generate_precompile improvements in #32850 which in turn needs the count(regex, string) method added in #32849. A yak shave in chained pull request form.

@DilumAluthge
Copy link
Member

I like this a lot!

base/file.jl Show resolved Hide resolved
base/file.jl Show resolved Hide resolved
base/regex.jl Outdated Show resolved Hide resolved
@StefanKarpinski
Copy link
Member Author

Tests passed for all but last commit, now running those tests.

@StefanKarpinski
Copy link
Member Author

Win32 failure is the usual FileWatching failure.

@StefanKarpinski StefanKarpinski force-pushed the sk/tmp-cleanup branch 3 times, most recently from b3f3aeb to 2347489 Compare August 10, 2019 20:07
@StefanKarpinski StefanKarpinski changed the title [WIP] mktemp, mktempdir: delete temp files on exit by default mktemp, mktempdir: delete temp files on exit by default Aug 10, 2019
@StefanKarpinski StefanKarpinski changed the title mktemp, mktempdir: delete temp files on exit by default mktemp/dir: delete temp files on exit by default Aug 10, 2019
@StefanKarpinski
Copy link
Member Author

Ok, I'd say this is ready to go. There is 3x as much test code added as implementation 😂. But the tests did catch several bonehead mistakes I'd made, so yay.

@StefanKarpinski
Copy link
Member Author

So, on Windows, of course, problem child that it is, you cannot remove a file while anyone has it open. So when we open a temporary file in the child and then forget about it and let the exit handler clean it up that causes all sorts of problems since the "resource is busy". One option would be to call GC.gc() in the temp_cleanup_atexit handler if there are any files that need deleting.

@DilumAluthge
Copy link
Member

Classic Windows

@StefanKarpinski
Copy link
Member Author

So it seems that on Windows doing chmod(0o400) on the parent directory of a file doesn't make it impossible to delete the file. Any ideas on how to prevent deletion of a temp file on Windows?

@DilumAluthge
Copy link
Member

I wonder if it’s related to this: https://support.microsoft.com/en-hk/help/320246/inherited-permissions-are-not-automatically-updated-when-you-move-fold

Apparently depending on what API you use, permissions don’t always inherit.

You said that chmod 000 didn’t work on Windows, right?

Is this just for the tests, or for the implementation?

@StefanKarpinski
Copy link
Member Author

Just the tests. Since we know Windows can't delete open files, I decided to use that approach.

@StefanKarpinski StefanKarpinski force-pushed the sk/tmp-cleanup branch 5 times, most recently from 7132497 to 3c46252 Compare August 13, 2019 12:29
The main change here is that the non-higher-order versions of mktemp/dir will
now by default put the temp files/dirs that they create into a temp cleanup list
which will be deleted upon process exit. Instead of putting lots of atexit
handlers in the list, this uses a single handler and keeps a list of temporary
paths to cleanup. If the list gets long (> 1024) it will be scanned for
already-deleted paths when adding a new path to the cleanup list. To avoid
running too frequently, it doubles the "too many paths" threshold after this.

If cleanup fails for higher order `mktemp() do ... end` forms, they will also
put their temp files/dirs in the cleanup list but marked for "asap" deletion.
Every time the above mechanism for cleaning out the temp cleanup list scans for
already-deleted temp files, this will try again to delete these temp files.
@StefanKarpinski
Copy link
Member Author

StefanKarpinski commented Aug 13, 2019

All green! Have squashed into one commit and changed GC.gc() calls to GC.gc(true) calls. This is ready to go now. Asked Jeff, Jameson and Elliot what they thought about this and all seemed in favor. This is effectively a simple garbage collector for temp paths without needing to make a special (mutable) type for temp paths that the normal GC can finalize.

@ararslan
Copy link
Member

I'd be interested to know what @fredrikekre's objection is.

@StefanKarpinski
Copy link
Member Author

StefanKarpinski commented Aug 13, 2019

My guess is that it's arguably overly complex and seems not to be something most runtimes do.

@fredrikekre
Copy link
Member

fredrikekre commented Aug 13, 2019

What Stefan said, but more importantly that I find these functions very useful to create random directories/filenames in a user created non-temp directory. I use this quite often and I want to inspect the files after Julia exits. With this implementation it does not seem to be a way to opt out either.
I think mktempdir(...) do d; ... ; end, together with the opt-out from #32355 is sufficient alternatives for users that want to cleanup.

@StefanKarpinski
Copy link
Member Author

StefanKarpinski commented Aug 13, 2019

With this implementation it does not seem to be a way to opt out either.

Well, mktemp(cleanup=false) and mktempdir(cleanup=false) in this PR lets you opt out.

I find these functions very useful to create random directories/filenames in a user created non-temp directory. I use this quite often and I want to inspect the files after Julia exits.

I think this use case is unusual enough that it doesn't really warrant being the default. You can also always do path = joinpath(dir, randstring()) if you want to create a random path that you want to keep around after your process has exited.

@StefanKarpinski
Copy link
Member Author

Regarding complexity, note that the implementation of this is only 25 lines long. The tests, on the other hand... are almost 200 lines and kind of nasty, but I wanted to make sure it was correctly implemented and it's incidentally testing a fair amount of other mktemp/dir functionality that wasn't very well tested previously, so I don't think that should be held against this change.

@vtjnash
Copy link
Member

vtjnash commented Aug 13, 2019

Regarding complexity

Not that I'd ever use more tests as a bad thing, but I think lots of kind of nasty edge cases (and as a weak proxy, number of tests) is the sort of the thing that makes something complex—much more so than SLOC of the code itself.

@StefanKarpinski
Copy link
Member Author

The tests aren't complex because there are lots of nasty edge cases, they're complex because it's testing something that inherently has to do with external mutable state (temp files and directories) and because I'm a thorough motherfucker and wanted to get full coverage of all code paths (and found several bonehead bugs because of it).

@StefanKarpinski
Copy link
Member Author

I'm going to go ahead with this given that it has relatively broad support, limited opposition and on the premise that most people want their temporary files and directories to be temporary.

if (all || asap) && ispath(path)
need_gc && GC.gc(true)
need_gc = false
rm(path, recursive=true, force=true)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn’t this need a try/catch? There’s a somewhat complex edge cases on Windows (deleting an open file will essentially just set the delete-on-close bit and cause some future operations, including rm, to error with file-not-found)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it? I thought that force=true made it never fail.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants