Skip to content

Conversation

Liozou
Copy link
Member

@Liozou Liozou commented Sep 23, 2025

On Windows, during precompilation of package A, a DLL file is generated and may replace the existing one. If A is already loaded in the julia session however, the corresponding soon-to-be-obsolete DLL cannot be removed while julia is running. Currently, this problem is solved by moving the old DLL in a special "julia_delayed_deletes" folder, which will be cleaned in a later julia session by Pkg.gc().
However, moving an in-use DLL is only permitted on the same drive, and the "julia_delayed_deletes" is located in the tempdir, a.k.a. on a fixed drive, say C:. Thus, if the DEPOT_PATH points to a ".julia" in another drive, say D:, any precompilation occuring on an already loaded package will fail. This is what happens in #59589, actually resulting in an infinite loop that bloats up both memory and disk. @staticfloat had actually predicted that such an issue could occur in #53456 (comment).

This PR fixes #59589 by changing the "delayed deleting" mechanism: instead of moving the old DLLs to a fixed folder, they are renamed in their initial folder and recorded in a list stored at a fixed location. Upon Pkg.gc(), the listed obsolete files will be deleted (JuliaLang/Pkg.jl#4392).

It also prevents any similar infinite loops by cutting the rm -> mv -> rm cycle into rm -> rename. I also removed the private argument allow_delayed_delete from rm, which is only called in by Pkg but does not appear to be useful.

EDIT: current state is #59635 (comment)

@Liozou Liozou added system:windows Affects only Windows backport 1.12 Change should be backported to release-1.12 labels Sep 23, 2025
@KristofferC KristofferC mentioned this pull request Sep 24, 2025
24 tasks
@Liozou
Copy link
Member Author

Liozou commented Sep 24, 2025

The CI failures come from the loading.jl test that attempt to remove the depot at the end of the test. This fails, because the DLL is still present (albeit renamed) within... So I'm reverting the design of this PR to something closer to the current state. In the current version of the PR:

  • When an in-use DLL is being rm-ed, rename it to a unique name in the same folder.
  • Move it to a special directory is either joinpath(tempdir(), "julia_delayed_deletes") if it is on the same drive as the DLL, else joinpath(drive, "julia_delayed_deletes") with the correct drive. This way, the depot can be removed during the same julia session. (removed it 8badff9)
  • The path to the now obsolete DLL is written in a file stored in joinpath(tempdir(), "julia_delayed_deletes_ref"). This file has a unique name generated by mktemp, so that multiple julia processes can attempt to rm the same in-use DLL without corruption.
  • Upon Pkg.gc(), the files in the "julia_delayed_deletes_ref" folder are read, each containing a single line with the path to an obsolete DLL to remove. If all DLLs in a "julia_delayed_deletes" folder have been removed, the folder itself is removed; if all files referenced in the "julia_delayed_deletes_ref" folder are removed, that folder is also removed.
  • Like currently, the rm called by Pkg.gc() in this particular setting is the only one that unsets the allow_delayed_delete kwarg. By default, rm attempts this strategy upon trying to delete an in-use DLL.

@vtjnash
Copy link
Member

vtjnash commented Sep 24, 2025

Thanks for working on this!

@Liozou
Copy link
Member Author

Liozou commented Sep 26, 2025

Thanks again for the reviews! In the current solution, I partially reverted the last changes so that the renamed obsolete DLLs remain in their folder. This required modifying the loadings.jl tests: instead of making the test depots with mktempdir(), which inevitably fails at cleanup because it encounters in-use DLLs, they are now made with mktempdir(; cleanup=false). Then, in an atexit hook, a special julia process is spawned, whose sole job is to wait for the test process to die, and then to cleanup these depots. This only affects Windows, nothing is changed for the other systems.

@Liozou Liozou force-pushed the rmdll branch 3 times, most recently from 6e8770a to 2e10a27 Compare September 26, 2025 14:48
@vtjnash
Copy link
Member

vtjnash commented Sep 26, 2025

The special julia process to be spawned is very clever. Nice job. We might even want to move this code into temp_cleanup_purge_all if the TEMP_CLEANUP set is non-empty after attempting in-process cleanup?

Co-authored-by: Jameson Nash <vtjnash@gmail.com>
@Liozou
Copy link
Member Author

Liozou commented Sep 28, 2025

Thanks a lot for the review again, the eof(stdin) strategy is smart! I'm not familiar with the lower-level Pipe code and the solution you proposed errors (since Base.dup expects a RawFD) so I patched something which I believe does what is expected, but an external check is most welcome!
I also moved this logic into a dedicated function of Base.Filesystem to be called exactly once atexit (to avoid spawning multiple such processes).

Co-authored-by: Jameson Nash <vtjnash@gmail.com>
@vtjnash vtjnash added the merge me PR is reviewed. Merge when all tests are passing label Sep 29, 2025
@vtjnash vtjnash requested a review from staticfloat September 29, 2025 20:48
Co-authored-by: Elliot Saba <staticfloat@gmail.com>
@KristofferC KristofferC mentioned this pull request Sep 30, 2025
47 tasks
@adienes adienes merged commit 3e2a4ed into JuliaLang:master Oct 1, 2025
7 checks passed
@adienes adienes removed the merge me PR is reviewed. Merge when all tests are passing label Oct 1, 2025
@Liozou Liozou deleted the rmdll branch October 1, 2025 15:08
KristofferC pushed a commit that referenced this pull request Oct 10, 2025
… in-use DLL (#59635)

On Windows, during precompilation of package `A`, a DLL file is
generated and may replace the existing one. If `A` is already loaded in
the julia session however, the corresponding soon-to-be-obsolete DLL
cannot be removed while julia is running. Currently, this problem is
solved by moving the old DLL in a special "julia_delayed_deletes"
folder, which will be cleaned in a later julia session by `Pkg.gc()`.
However, moving an in-use DLL is only permitted on the same drive, and
the "julia_delayed_deletes" is located in the `tempdir`, a.k.a. on a
fixed drive, say `C:`. Thus, if the `DEPOT_PATH` points to a ".julia" in
another drive, say `D:`, any precompilation occuring on an already
loaded package will fail. This is what happens in #59589, actually
resulting in an infinite loop that bloats up both memory and disk.
@staticfloat had actually predicted that such an issue could occur in
#53456 (comment).

This PR fixes #59589 by changing the "delayed deleting" mechanism:
instead of moving the old DLLs to a fixed folder, they are renamed in
their initial folder and recorded in a list stored at a fixed location.
Upon `Pkg.gc()`, the listed obsolete files will be deleted
(JuliaLang/Pkg.jl#4392).

It also prevents any similar infinite loops by cutting the `rm -> mv ->
rm` cycle into `rm -> rename`. ~I also removed the private argument
`allow_delayed_delete` from `rm`, which is only called in by
[Pkg](https://github.com/JuliaLang/Pkg.jl/blob/7344e2656475261a83a6bd37d9d4cc1e7dcf5f0d/src/API.jl#L1127)
but does not appear to be useful.~

EDIT: current state is
#59635 (comment)

---------

Co-authored-by: Jameson Nash <vtjnash@gmail.com>
Co-authored-by: Elliot Saba <staticfloat@gmail.com>
(cherry picked from commit 3e2a4ed)
KristofferC pushed a commit that referenced this pull request Oct 12, 2025
… in-use DLL (#59635)

On Windows, during precompilation of package `A`, a DLL file is
generated and may replace the existing one. If `A` is already loaded in
the julia session however, the corresponding soon-to-be-obsolete DLL
cannot be removed while julia is running. Currently, this problem is
solved by moving the old DLL in a special "julia_delayed_deletes"
folder, which will be cleaned in a later julia session by `Pkg.gc()`.
However, moving an in-use DLL is only permitted on the same drive, and
the "julia_delayed_deletes" is located in the `tempdir`, a.k.a. on a
fixed drive, say `C:`. Thus, if the `DEPOT_PATH` points to a ".julia" in
another drive, say `D:`, any precompilation occuring on an already
loaded package will fail. This is what happens in #59589, actually
resulting in an infinite loop that bloats up both memory and disk.
@staticfloat had actually predicted that such an issue could occur in
#53456 (comment).

This PR fixes #59589 by changing the "delayed deleting" mechanism:
instead of moving the old DLLs to a fixed folder, they are renamed in
their initial folder and recorded in a list stored at a fixed location.
Upon `Pkg.gc()`, the listed obsolete files will be deleted
(JuliaLang/Pkg.jl#4392).

It also prevents any similar infinite loops by cutting the `rm -> mv ->
rm` cycle into `rm -> rename`. ~I also removed the private argument
`allow_delayed_delete` from `rm`, which is only called in by
[Pkg](https://github.com/JuliaLang/Pkg.jl/blob/7344e2656475261a83a6bd37d9d4cc1e7dcf5f0d/src/API.jl#L1127)
but does not appear to be useful.~

EDIT: current state is
#59635 (comment)

---------

Co-authored-by: Jameson Nash <vtjnash@gmail.com>
Co-authored-by: Elliot Saba <staticfloat@gmail.com>
(cherry picked from commit 3e2a4ed)
KristofferC pushed a commit that referenced this pull request Oct 14, 2025
… in-use DLL (#59635)

On Windows, during precompilation of package `A`, a DLL file is
generated and may replace the existing one. If `A` is already loaded in
the julia session however, the corresponding soon-to-be-obsolete DLL
cannot be removed while julia is running. Currently, this problem is
solved by moving the old DLL in a special "julia_delayed_deletes"
folder, which will be cleaned in a later julia session by `Pkg.gc()`.
However, moving an in-use DLL is only permitted on the same drive, and
the "julia_delayed_deletes" is located in the `tempdir`, a.k.a. on a
fixed drive, say `C:`. Thus, if the `DEPOT_PATH` points to a ".julia" in
another drive, say `D:`, any precompilation occuring on an already
loaded package will fail. This is what happens in #59589, actually
resulting in an infinite loop that bloats up both memory and disk.
@staticfloat had actually predicted that such an issue could occur in
#53456 (comment).

This PR fixes #59589 by changing the "delayed deleting" mechanism:
instead of moving the old DLLs to a fixed folder, they are renamed in
their initial folder and recorded in a list stored at a fixed location.
Upon `Pkg.gc()`, the listed obsolete files will be deleted
(JuliaLang/Pkg.jl#4392).

It also prevents any similar infinite loops by cutting the `rm -> mv ->
rm` cycle into `rm -> rename`. ~I also removed the private argument
`allow_delayed_delete` from `rm`, which is only called in by
[Pkg](https://github.com/JuliaLang/Pkg.jl/blob/7344e2656475261a83a6bd37d9d4cc1e7dcf5f0d/src/API.jl#L1127)
but does not appear to be useful.~

EDIT: current state is
#59635 (comment)

---------

Co-authored-by: Jameson Nash <vtjnash@gmail.com>
Co-authored-by: Elliot Saba <staticfloat@gmail.com>
(cherry picked from commit 3e2a4ed)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport 1.12 Change should be backported to release-1.12 system:windows Affects only Windows

Projects

None yet

Development

Successfully merging this pull request may close these issues.

rm of an in-use DLL on a separate drive from tempdir on Windows will infinitely recurse

6 participants