Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conflicting package versions can cause cache files to be rejected #50070

Open
topolarity opened this issue Jun 5, 2023 · 26 comments
Open

Conflicting package versions can cause cache files to be rejected #50070

topolarity opened this issue Jun 5, 2023 · 26 comments
Labels
bug Indicates an unexpected problem or unintended behavior packages Package management and loading

Comments

@topolarity
Copy link
Member

Create a fresh depot and initialize it with:

using Pkg
Pkg.add(name="Parsers", version="2.5.8");
Pkg.add(name="JSON", version="0.21.3");
Pkg.add(name="Preferences", version="1.3.0");

Then, try to pre-compile into a temporary environment before using Distributed:

using Parsers; # Might come from e.g. `startup.jl`

using Pkg;
Pkg.activate(; temp=true);
Pkg.add(["PrecompileTools"]);
Pkg.precompile();
using Distributed; ENV["JULIA_PROJECT"] = join(split(Pkg.project().path, "\\")[1:end-1], "\\");
addprocs(3);
@everywhere using PrecompileTools;

With JULIA_DEBUG=loading we see that we get unexpected pre-compilation in the worker processes:

      From worker 2:    ┌ Debug: Ignoring cache file testing/compiled/v1.9/Preferences/pWSk8_nVRvo.ji for Preferences [21216c6a-2e73-6563-6e65-726566657250] (fafbfcfd-0ccb-f095-0001-d3b7d541767a) since it is does not provide desired build_id (fafbfcfd-a0a4-d53b-0001-d39bed612a5b)
      From worker 2:    └ @ Base loading.jl:2724
      From worker 2:    ┌ Debug: Rejecting cache file testing/compiled/v1.9/PrecompileTools/AQ9Mk_nVRvo.ji because required dependency Preferences [21216c6a-2e73-6563-6e65-726566657250] with build ID fafbfcfd-a0a4-d53b-0001-d39bed612a5b is missing from the cache.
      From worker 2:    └ @ Base loading.jl:1440
      From worker 2:    ┌ Debug: Precompiling PrecompileTools [aea7be01-6a6a-4083-8856-8a6e6704d82a]
      From worker 2:    └ @ Base loading.jl:2140

A common real-world situation is having using PkgAuthentication in your startup.jl which immediately loads the old, conflicting versions of packages like this before you can Pkg.activate(; temp=true).

It's not obvious to me how much of this behavior is a bug, but at the very least it seems that the precompile/loading system should not repeatedly re-compile the same conflicting package N times across all of the workers.

@topolarity topolarity added bug Indicates an unexpected problem or unintended behavior packages Package management and loading labels Jun 5, 2023
@IanButterworth
Copy link
Member

the precompile/loading system should not repeatedly re-compile the same conflicting package N times across all of the workers

Will be made more efficient by #49052 at least.

I'm not sure whether much can be done if conflicting package versions are present in the parent but not workers

@topolarity
Copy link
Member Author

I'm not sure whether much can be done if conflicting package versions are present in the parent but not workers

Is it possible to make the Pkg.precompile do the right thing in the beginning so that the workers don't re-compile extra pkgimages at all?

The parent process may be unable to load the new versions, but it seems like the precompile should still work

@IanButterworth
Copy link
Member

Actually that's what it should be doing.. those packages show up as yellow in Pkg.precompile don't perhaps something is broken here

@topolarity
Copy link
Member Author

perhaps something is broken here

Yeah, I think so - The workaround for now is also pretty awkward:

What we're doing is running with --startup-file=no and manually copying around auth.toml to avoid loading any packages in the global environment, since in general those might be incompatible with their local versions and break pre-compilation.

@staticfloat
Copy link
Member

@IanButterworth I briefly just checked on this to see if there was some issue like the ctx.env.manifest here was using the wrong versions when it was precompiling, but as far as I can tell, it's all as expected. Any idea what could be going wrong here?

@KristofferC
Copy link
Member

I am also surprised why loaded versions would have an affect here. But I can imagine one possible scenario (on Windows) where things would go bad but I am not sure that is applicable here:

  1. add Example@0.5.3 # precompiles, creates a dll
  2. using Example # loads the dll
  3. add Example@0.5.1 # precompiles, tries to overwrite the dll

Step 3 will try to overwrite the created dll from 1 but that will probably fail since the dll is opened and Windows doesn't let you overwrite open files. However, in this example the packages at different versions are in different environments and should thus be given different paths and be able to coexist peacefully..

@staticfloat
Copy link
Member

First, this is reproducible on non-windows, and second, different versions should get different slugs in their cache filename, so I don't think that's the issue.

@KristofferC
Copy link
Member

different versions should get different slugs in their cache filename, so I don't think that's the issue.

Okay, but look at this then:

❯ rm -r ~/.julia/compiled/v1.9/Example

❯ julia -q
(@v1.9) pkg> activate --temp
  Activating new project at `/var/folders/tp/2p4x9ygx48sgsdl1ccg1mp_40000gn/T/jl_7XXWgH`

(jl_7XXWgH) pkg> add Example
 `/private/var/folders/tp/2p4x9ygx48sgsdl1ccg1mp_40000gn/T/jl_7XXWgH/Manifest.toml`
  [7876af07] + Example v0.5.3
Precompiling project...
  1 dependency successfully precompiled in 1 seconds

<<<< Precompiled Example 0.5.3 here >>>>>>>>

julia> using Example

(jl_7XXWgH) pkg> add Example@0.5.1
 `/private/var/folders/tp/2p4x9ygx48sgsdl1ccg1mp_40000gn/T/jl_7XXWgH/Manifest.toml`
⌃ [7876af07] ↓ Example v0.5.3 ⇒ v0.5.1

Precompiling project...
  ✓ Example
  1 dependency successfully precompiled in 1 seconds
  1 dependency precompiled but a different version is currently loaded. Restart julia to access the new version

<<<< Precompiled Example 0.5.1 here >>>>>>>>

❯ ls ~/.julia/compiled/v1.9/Example
lLvWP_7Axpy.dylib  lLvWP_7Axpy.dylib.dSYM  lLvWP_7Axpy.ji

<<<<<< Only one cache file here, should be two for both versions? >>>>>>>>

@topolarity
Copy link
Member Author

Looks like a smoking gun to me.

Before the add Example@0.5.1, the .ji file appears to be for /home/topolarity/.julia/packages/Example/aqsx3/src/Example.jl on my machine. Afterward, it's for /home/topolarity/.julia/packages/Example/kH44X/src/Example.jl

The slug for the .ji/.so appears to be independent from either package slug: ~/.julia/compiled/v1.9/Example/lLvWP_kASaS.ji

@KristofferC
Copy link
Member

I am not sure it is a smoking gun but rather that

different versions should get different slugs in their cache filename

is not true and my example in #50070 (comment) is a correctly described problem but is not related to the issue here.

@IanButterworth
Copy link
Member

So the problem is this in assuming that the package is coming from the active project, not one up the stack?

crc = _crc32c(something(Base.active_project(), ""))

@KristofferC
Copy link
Member

Why would that be a problem in this case here?

@IanButterworth
Copy link
Member

Ok yeah I can't see how. (That does seem wrong though)

@topolarity
Copy link
Member Author

Is it possible this keep_loaded_modules logic is interfering with us?

julia/base/loading.jl

Lines 2264 to 2272 in 631d187

# build up the list of modules that we want the precompile process to preserve
concrete_deps = copy(_concrete_dependencies)
if keep_loaded_modules
for mod in loaded_modules_array()
if !(mod === Main || mod === Core || mod === Base)
push!(concrete_deps, PkgId(mod) => module_build_id(mod))
end
end
end

What's its purpose?

@IanButterworth
Copy link
Member

@topolarity
Copy link
Member Author

I'm hitting that pathway anyway at @everywhere using PrecompileTools;

Which seems to cause:

[ Info: Precompiling PrecompileTools [aea7be01-6a6a-4083-8856-8a6e6704d82a]
┌ Debug: Rejecting cache file /home/topolarity/repos/julia/tmp_depot/compiled/v1.10/Preferences/pWSk8_8ipBP.ji because it provides the wrong build_id (got fafbfcfd-b38d-76ee-0002-ac1a045be40b) for Preferences [21216c6a-2e73-6563-6e65-726566657250] (want fafbfcfd-3d3c-727f-0002-ac149daecd42)
└ @ Base loading.jl:2908

@IanButterworth
Copy link
Member

Yeah, code load precompilation will hit that.

AFAICT, distributed doesn't forward Base._concrete_dependencies from the parent to workers

@IanButterworth
Copy link
Member

If you take Distributed out of the picture and start a julia process in the same environment do you see the same issue?

@topolarity
Copy link
Member Author

AFAICT, distributed doesn't forward Base._concrete_dependencies from the parent to workers

That's true, but pre-compilation gets triggered unexpectedly in the parent process just as the workers are starting up:

using Pkg
Pkg.add(name="Parsers", version="2.5.8");
Pkg.add(name="JSON", version="0.21.3");
Pkg.add(name="Preferences", version="1.3.0");
using Parsers; # Might come from e.g. `startup.jl`

Pkg.activate(; temp=true);
Pkg.add(["PrecompileTools"]);
Pkg.precompile();      # does nothing
using PrecompileTools; # unexpectedly triggers pre-compilation

That pre-compilation in the parent process modifies PrecompileTools/AQ9Mk_Gruqa.ji to reference a different build_id for Preferences - Probably one which doesn't actually match the cache files saved for Preferences, so that it cannot be loaded?

@vchuravy vchuravy changed the title Conflicting package versions can cause pkgimage cache files to be rejected Conflicting package versions can cause cache files to be rejected Jun 16, 2023
@topolarity
Copy link
Member Author

Two thoughts for a resolution here:

  • Can we avoid cache file thrash by giving the using PrecompileTools cachefile a unique slug (that factors in its conflicting stacked environment) versus the Pkg.precompile() cachefile (which assumes a non-conflicting environment)?
  • Should we provide a Pkg.precompile() flag that also performs pre-compilation for the existing conflicted environment?

@IanButterworth
Copy link
Member

I think the least we should do is the PR here plus my suggestion #44329 (comment)

@topolarity
Copy link
Member Author

That sounds good to me, but we still need to fix the package thrash IMO if we want the Distributed case to work correctly

@IanButterworth
Copy link
Member

Do people use stacked environments in Distributed workers? I can't imagine that's common?

@topolarity
Copy link
Member Author

The use case this issue came from did exactly that - It doesn't seem strange to me that someone would try to Pkg.activate(; temp=true) in an effort to create a fresh environment to do Distributed work in

@topolarity
Copy link
Member Author

Another common situation that I just ran into is:

  1. Mess around in the REPL for a bit, realize that you need to add/update a package
  2. Do a pkg> update which will pre-compile for the new environment (pre-compile # 1)
  3. Conflicting dependencies cause using Foo to blow out the pre-compilation work you just did (pre-compile # 2)
  4. Finally, when you restart your project later you have to pre-compile a third time (pre-compile # 3)

@cossio
Copy link
Contributor

cossio commented Aug 30, 2023

So the problem is this in assuming that the package is coming from the active project, not one up the stack?

crc = _crc32c(something(Base.active_project(), ""))

We were hitting some issues related to loading Makie from the global environment, while working in a different active project. See:

Your comment there could be possibly related to this @IanButterworth ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Indicates an unexpected problem or unintended behavior packages Package management and loading
Projects
None yet
Development

No branches or pull requests

5 participants