-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Latency #402
Comments
Actually I think SnoopCompile will help quite a bit. Could also be super helpful for my new design in asedftk. I'm definitely going to take a look at that at some point. |
At least it'll tell us which packages slow us down and then we can bug them. |
I just tried the new 1.6 beta, a lot of things in general feel faster but |
Something is seriously wrong somewhere. Just taking out lobpcg_hyper_impl.jl with a simple test script
has a second |
Playing with JuliaLang/julia#41612:
|
Hmm once #483 is in we can probably make PyCall optional. That already is quite a large chunk. As for |
But 9 seconds is really not great. We should definitely work on that. |
The load time is not even that much of an issue, the time to first SCF feels longer in my experience. The LOBPCG stuff is apparently responsible for some of it (though I have no idea why) |
https://discourse.julialang.org/t/ann-new-package-snoopprecompile/84778/4 sounds like exactly what we need |
No it does not help. I tried it yesterday. We have too many invalidations. We need to fix those first. |
Pretty sure that's not our issue. They always talk about that and it's one source of latency, but I don't think that's the one we suffer from. At least I couldn't find anything problematic when I took at look last time |
List of things to try:
|
The julia beta released today has the latency improvements, which look pretty spectacular. Not at a computer for a while, but very curious to see what it looks like for us |
I just tested this expecting some improvements. But it seems there is a regression. I tested the first example in the README and measured the time to first SCF: using DFTK, Unitful, UnitfulAtomic
# 1. Define lattice and atomic positions
a = 5.431u"angstrom" # Silicon lattice constant
lattice = a / 2 * [[0 1 1.]; # Silicon lattice vectors
[1 0 1.]; # specified column by column
[1 1 0.]];
# Load HGH pseudopotential for Silicon
Si = ElementPsp(:Si, psp=load_psp("hgh/lda/Si-q4"))
# Specify type and positions of atoms
atoms = [Si, Si]
positions = [ones(3)/8, -ones(3)/8]
# 2. Select model and basis
model = model_LDA(lattice, atoms, positions)
kgrid = [4, 4, 4] # k-point grid (Regular Monkhorst-Pack grid)
Ecut = 7 # kinetic energy cutoff
# Ecut = 190.5u"eV" # Could also use eV or other energy-compatible units
basis = PlaneWaveBasis(model; Ecut, kgrid)
# Note the implicit passing of keyword arguments here:
# this is equivalent to PlaneWaveBasis(model; Ecut=Ecut, kgrid=kgrid)
# 3. Run the SCF procedure to obtain the ground state
@time scfres = self_consistent_field(basis, tol=1e-5); Following are the results for two runs each in 1.8.4 and 1.9.0-beta2. 1.8.4: julia> Pkg.status()
Status `~/.julia/environments/dftk/Project.toml`
[acf6eb54] DFTK v0.6.0 `~/GitProjects/DFTK`
julia> versioninfo()
Julia Version 1.8.4
Commit 00177ebc4fc (2022-12-23 21:32 UTC)
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: 8 × Intel(R) Xeon(R) CPU X5550 @ 2.67GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-13.0.1 (ORCJIT, nehalem)
Threads: 4 on 8 virtual cores
Environment:
JULIA_NUM_THREADS = 4
JULIA_PKG_DEVDIR = /home/rashid/GitProjects
JULIA_NUM_PRECOMPILE_TASKS = 6
JULIA_CONDAPKG_VERBOSITY = 0
JULIA_CONDAPKG_EXE = conda Run 1: 1.9.0-beta2: julia> Pkg.status()
Status `~/.julia/environments/dftk/Project.toml`
[acf6eb54] DFTK v0.6.0 `~/GitProjects/DFTK`
julia> versioninfo()
Julia Version 1.9.0-beta2
Commit 7daffeecb8c (2022-12-29 07:45 UTC)
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: 8 × Intel(R) Xeon(R) CPU X5550 @ 2.67GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-14.0.6 (ORCJIT, nehalem)
Threads: 4 on 8 virtual cores
Environment:
JULIA_NUM_THREADS = 4
JULIA_PKG_DEVDIR = /home/rashid/GitProjects
JULIA_NUM_PRECOMPILE_TASKS = 6
JULIA_CONDAPKG_VERBOSITY = 0
JULIA_CONDAPKG_EXE = conda Run 1: |
Oh :( thanks for testing though. Next step is to put this snippet in DFTK.jl so it gets precompiled... Otherwise something must be blocking precompolation, maybe the timeroutputs macro... |
From what I read the new changes introduce a load time regression, the trade-off being a more efficient precompilation, so these numbers are not that crazy, they just mean that precompilation is not working for us |
The following diff adds precompilation to DFTK:
Some numbers on 1.9beta Without precompilation code With precompilation code So all in all a very nice win for users. Not so much for developers though. I asked on the julia slack, and people recommend turning off precompilation of deved packages. Should we just merge this diff, at least when 1.9 is released? @rashidrafeek can you post your numbers with this diff, since they're much slower than mine? |
Its a lot better after applying this patch! Number from my system with 1.9.0-beta2: Without precompilation code: 1 dependency successfully precompiled in 13 seconds. 103 already precompiled.
78.601886 seconds (80.51 M allocations: 5.735 GiB, 6.91% gc time, 169.27% compilation time) With precompilation code: 1 dependency successfully precompiled in 167 seconds. 103 already precompiled.
10.451499 seconds (10.32 M allocations: 875.533 MiB, 3.96% gc time, 149.20% compilation time) |
Can you try the the same patch but with |
Hmm that's still pretty impressive overall. Nice step in the right direction! |
1 dependency successfully precompiled in 170 seconds. 103 already precompiled.
1.412220 seconds (467.34 k allocations: 216.655 MiB, 4.57% gc time, 41.23% compilation time) 😲 |
Huh I had more than this. Did you do both steps in the same session? |
Meaning, both precompilation and running code? I tried in the same session as well as after restarting julia, as I was not sure of the result. Both gave similar results. A result after restarting julia: 1.186028 seconds (466.23 k allocations: 216.232 MiB, 3.85% gc time, 38.92% compilation time) |
Oh wow. OK fine, let's just do it and eat the precompilation cost. I'll add a flag so devs can disable it. |
Ah I see where the discrepancy between your numbers and mine come from, you're measuring the last |
Could be good to run https://discourse.julialang.org/t/new-tools-for-reducing-compiler-latency/52882, although I would imagine that our latency cost is because of our deps, not because of us.
The text was updated successfully, but these errors were encountered: