Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should use __precompile__(false) (at least for extensions) #53

Closed
PhilipFackler opened this issue Mar 8, 2024 · 16 comments
Closed

Should use __precompile__(false) (at least for extensions) #53

PhilipFackler opened this issue Mar 8, 2024 · 16 comments

Comments

@PhilipFackler
Copy link
Collaborator

When using the cuda backend I get the following message:

WARNING: Method definition parallel_for(I, F, Any...) where {I<:Integer, F<:Function} in module JACC at /home/4pf/.julia/packages/JACC/Crxi0/src/JACC.jl:11 overwritten in module JACCCUDA at /home/4pf/.julia/packages/JACC/Crxi0/ext/JACCCUDA/JACCCUDA.jl:5.
ERROR: Method overwriting is not permitted during Module precompilation. Use `__precompile__(false)` to opt-out of precompilation.

It's an error when I don't add __precompile__(false) to my own module definition.

@michel2323
Copy link
Contributor

This will be fixed with #52 .

@williamfgc
Copy link
Collaborator

@michel2323 @PhilipFackler is this resolved?

@PhilipFackler
Copy link
Collaborator Author

The solution merged with #52 is too restrictive. I'm using a previous revision to avoid it. So, I would say no.

@michel2323
Copy link
Contributor

@PhilipFackler In what sense too restrictive?

@PhilipFackler
Copy link
Collaborator Author

@michel2323 see #56

@williamfgc
Copy link
Collaborator

@PhilipFackler can you elaborate why this is an error or how it's trigerred? I'm trying to understand why it doesn't show on our GPU CI tests e.g. here

@michel2323
Copy link
Contributor

michel2323 commented Apr 5, 2024

It doesn't get triggered because I resolved it by specifying types in #52 . However, that leads to @PhilipFackler only being able to pass numbers or arrays, not sructs #56 . The two issues are intertwined. When using structs, we cannot dispatch for the backend via the struct type, as it's not clear whether it is a CUDA or whatever based type.

Edit: https://github.com/JuliaORNL/JACC.jl/actions/runs/8238659570/job/22530105977#step:4:298 Old error.

@williamfgc
Copy link
Collaborator

williamfgc commented Apr 17, 2024

I did more digging and overriding functions due to variadic arguments of any type x... is an error since Julia 1.10 (it was a warning in 1.9) because of the current nature of JACC.jl it makes sense to use __precompile__(false) and let users select back ends for the appropriate module compilation in ext.

@williamfgc
Copy link
Collaborator

I'm turning pre-compilation off (for now). Changing APIs will need a much larger discussion as we discussed offline.

@michel2323
Copy link
Contributor

michel2323 commented Apr 17, 2024

Every package that uses JACC.jl will also not be able to precompile. So, if you implement a PDE solver using JACC.jl, that solver won't be precompiled, substantially slowing down a user's code startup. So even if JACC is light, it will break precompilation for every large package that uses JACC.

@williamfgc
Copy link
Collaborator

williamfgc commented Apr 17, 2024

@michel2323 point taken. We should open a new issue trying to understand the trade-offs of code start-up. Since we are targeting HPC use cases, we should understand how this cost is amortized and the JIT implications. Precompilation and optional packages as weak_dependencies are somehow opposite concepts :). We just have a few simple use cases we want to try with the simple API and make incremental progress.

@williamfgc
Copy link
Collaborator

FYI, I tested the current tag version v0.3 without precompile __(false)__ with the GrayScott.jl on this branch, that has JACC as a strong dependency, precompilation just hangs on various systems as the pid process is still running. I don't know if it's related to Julia issue #51901. Works with the current main with __precompile__(false).

For added context, the GrayScott.jl pre-compilation takes roughly 18s on my local machine and similar numbers on our AMD GPU CI system. For an hours-long runs this is an amortized cost, but we'll keep en eye if this becomes a real problem. Hope this helps.

@michel2323
Copy link
Contributor

Especially if you add MPI to the mix, which I guess is important for ORNL, having no precompilation support will be daunting. On those machines, you get filesystem access races. During that process message you mention.

@williamfgc
Copy link
Collaborator

williamfgc commented May 3, 2024

Yes, those are really good points., The JIT component does not go away completely with pre-compilation, we published some numbers on the paper running on Frontier. Also, accessing local files is not completely by-passed (e.g. LocalPreferences and local packages). I think something like PackageCompiler.jl might help for this scenario to compile an app?. This is something @PhilipFackler tried. Still, I'd try to test the overheads when we have more apps use cases :)

@michel2323
Copy link
Contributor

PackageCompiler won't work if precompilation doesn't work or is manually disabled.

@williamfgc
Copy link
Collaborator

Yes, I meant exploring the precompile + PackageCompiler.jl path. For the time being keeping __precompile__ (false) is what sanely works (not free lunch) :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants