-
Notifications
You must be signed in to change notification settings - Fork 127
[NDTensors] Avoid threadid
in block sparse multithreading code
#1650
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…s.jl into NDTensors_avoid_threadid
This should be good to go once tests pass. There is a small (10%) slowdown from this PR that I can't track down but I think that is acceptable. For posterity, running https://github.com/ITensor/ITensorMPS.jl/blob/v0.3.17/examples/dmrg/2d_hubbard_conserve_momentum.jl before this PR gives: julia> energy, H, psi = main(; Nx=8, Ny=4, U=4.0, t=1.0, nsweeps=5, maxdim=3000, threaded_blocksparse=true);
Threads.nthreads() = 4
ITensors.using_threaded_blocksparse() = true
nnz(H[end ÷ 2]) = 83
nnzblocks(H[end ÷ 2]) = 80
4×8 Matrix{String}:
"↑" "↓" "↑" "↓" "↑" "↓" "↑" "↓"
"↓" "↑" "↓" "↑" "↓" "↑" "↓" "↑"
"↑" "↓" "↑" "↓" "↑" "↓" "↑" "↓"
"↓" "↑" "↓" "↑" "↓" "↑" "↓" "↑"
inner(psi0', H, psi0) = 25.6931471817727
0.004488 seconds (23.80 k allocations: 7.051 MiB)
After sweep 1 energy=-8.237374080471689 maxlinkdim=100 maxerr=1.12E-05 time=1.088
After sweep 2 energy=-19.898330380819594 maxlinkdim=200 maxerr=8.70E-04 time=11.542
After sweep 3 energy=-25.874269858341727 maxlinkdim=400 maxerr=2.29E-04 time=25.872
After sweep 4 energy=-26.53067160826987 maxlinkdim=800 maxerr=1.44E-04 time=21.346
After sweep 5 energy=-26.66437390383375 maxlinkdim=2000 maxerr=2.39E-05 time=42.468
102.321462 seconds (869.09 M allocations: 274.549 GiB, 22.87% gc time, 0.15% compilation time)
(Nx, Ny) = (8, 4)
(t, U) = (1.0, 4.0)
flux(psi) = QN(("Ky",0,4),("Nf",32,-1),("Sz",0))
maxlinkdim(psi) = 2000
energy = -26.66437390383375 while this PR gives: julia> energy, H, psi = main(; Nx=8, Ny=4, U=4.0, t=1.0, nsweeps=5, maxdim=3000, threaded_blocksparse=true);
Threads.nthreads() = 4
ITensors.using_threaded_blocksparse() = true
nnz(H[end ÷ 2]) = 83
nnzblocks(H[end ÷ 2]) = 80
4×8 Matrix{String}:
"↑" "↓" "↑" "↓" "↑" "↓" "↑" "↓"
"↓" "↑" "↓" "↑" "↓" "↑" "↓" "↑"
"↑" "↓" "↑" "↓" "↑" "↓" "↑" "↓"
"↓" "↑" "↓" "↑" "↓" "↑" "↓" "↑"
inner(psi0', H, psi0) = 25.6931471817727
0.004372 seconds (24.32 k allocations: 7.572 MiB)
After sweep 1 energy=-8.237374080540086 maxlinkdim=100 maxerr=1.12E-05 time=1.162
After sweep 2 energy=-19.89833038088842 maxlinkdim=200 maxerr=8.70E-04 time=12.989
After sweep 3 energy=-25.874269858341243 maxlinkdim=400 maxerr=2.29E-04 time=29.443
After sweep 4 energy=-26.530671608297425 maxlinkdim=800 maxerr=1.44E-04 time=22.788
After sweep 5 energy=-26.664373903839163 maxlinkdim=2000 maxerr=2.39E-05 time=44.923
111.313039 seconds (936.05 M allocations: 295.924 GiB, 22.80% gc time, 0.14% compilation time)
(Nx, Ny) = (8, 4)
(t, U) = (1.0, 4.0)
flux(psi) = QN(("Ky",0,4),("Nf",32,-1),("Sz",0))
maxlinkdim(psi) = 2000
energy = -26.664373903839163 EDIT: Note that discrepancy is only when block sparse multithreading is run, and in particular for block sparse multithreading where the logic of determining the block contraction list (as opposed to performing the actual contractions) is significant, which is the case for hybrid real space and momentum space 2D DMRG calculations. I don't see any discrepancy for systems like the 1D Heisenberg model when conserving Sz. |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #1650 +/- ##
=======================================
Coverage 80.91% 80.91%
=======================================
Files 59 59
Lines 4626 4626
=======================================
Hits 3743 3743
Misses 883 883 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Followup to #1648.
To-do: