-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unnecessary modulo operations in gf<cyclic_lattice> bracket accessor #725
Comments
As a follow up, here are four isolated minimal functions that I tested.
g_tr_t minimal_example_1(chi_tr_vt gamma_tr, g_tr_vt F_tr) {
auto delta_tr = make_gf(F_tr);
delta_tr *= 0;
auto [tmesh, rmesh] = F_tr.mesh();
for (const auto t : tmesh) {
for (const auto r : rmesh) {
for (auto [A, a, B, b] : gamma_tr.target_indices()) {
delta_tr[t, r](a, b) += -gamma_tr[t, r](A, a, b, B) * F_tr[t, r](A, B);
}
}
}
return delta_tr;
}
g_tr_t minimal_example_2(chi_tr_vt gamma_tr, g_tr_vt F_tr) {
auto delta_tr = make_gf(F_tr);
delta_tr *= 0;
auto [tmesh, rmesh] = F_tr.mesh();
for (const auto t : tmesh) {
for (const auto r : rmesh) {
delta_tr[t, r](0, 0) += -gamma_tr[t, r](0, 0, 0, 0) * F_tr[t, r](0, 0);
}
}
return delta_tr;
}
g_tr_t minimal_example_3(chi_tr_vt gamma_tr, g_tr_vt F_tr) {
auto delta_tr = make_gf(F_tr);
delta_tr *= 0;
auto [tmesh, rmesh] = F_tr.mesh();
auto tmesh_gamma = std::get<0>(gamma_tr.mesh());
auto _ = all_t{};
for (const auto r : rmesh) {
auto delta_t = make_gf<imtime>(tmesh, delta_tr.target());
auto gamma_t = make_gf<imtime>(tmesh_gamma, gamma_tr.target());
auto F_t = make_gf<imtime>(tmesh, F_tr.target());
for (const auto t : tmesh) {
for (auto [A, a, B, b] : gamma_tr.target_indices()) {
delta_t[t](a, b) += -gamma_t[t](A, a, b, B) * F_t[t](A, B);
}
}
delta_tr[_, r] = delta_t;
}
return delta_tr;
}
g_tr_t minimal_example_4(chi_tr_vt gamma_tr, g_tr_vt F_tr) {
auto delta_tr = make_gf(F_tr);
delta_tr *= 0;
auto [tmesh, rmesh] = F_tr.mesh();
auto tmesh_gamma = std::get<0>(gamma_tr.mesh());
auto _ = all_t{};
for (const auto r : rmesh) {
auto delta_t = make_gf<imtime>(tmesh, delta_tr.target());
auto gamma_t = make_gf<imtime>(tmesh_gamma, gamma_tr.target());
auto F_t = make_gf<imtime>(tmesh, F_tr.target());
for (const auto t : tmesh) {
delta_t[t](0, 0) += -gamma_t[t](0, 0, 0, 0) * F_t[t](0, 0);
}
delta_tr[_, r] = delta_t;
}
return delta_tr;
} Calling those functions in a small python script: from pytriqs.gf import *
from triqs_tprf.lattice import minimal_example_1, minimal_example_2, minimal_example_3, minimal_example_4
BETA = 25
NR = 24
NT = 200
NORB = 3
tmesh = MeshImTime(beta=BETA, S='Fermion', n_max=NT)
tmesh_gamma = MeshImTime(beta=BETA, S='Boson', n_max=NT)
rmesh = MeshCyclicLattice(NR, NR, NR)
F_tr = Gf(mesh=MeshProduct(tmesh, rmesh), target_shape=(NORB,)*2)
gamma_tr = Gf(mesh=MeshProduct(tmesh_gamma, rmesh), target_shape=(NORB,)*4)
functions = [minimal_example_1, minimal_example_2, minimal_example_3, minimal_example_4]
for function in functions:
function(gamma_tr, F_tr) and profiling it using cProfile, I get the following output.
So version 2 and 4 should obviously be faster by having no loop over the orbitals. But here you can see, that the difference between version 1 and 3 is immense. Using NR= 40, and NORB=1 gives :
where the computation time of all versions is similar. |
After profiling those minimal example together with @Wentzell, we found out, that a lot of time in the Changing the line triqs/triqs/lattice/cluster_mesh.hpp Line 141 in 67cf435
to linear_index_t index_to_linear(index_t const &i) const { return i[0] * s2 + i[1] * s1 + i[2]; } i.e. getting rid of the unnecessary modulo operations.
giving at least somehow expected run times. |
Hey Stefan, Thank you for checking the timings without the modulo operations. It looks like the timings are now We should keep this issue open until this has been addressed. |
…ar of cluster_mesh
This is fixed by #744 |
Dear all,
I noticed a speed up while refactoring, which I don't understand.
In specific I measured this code block:
Here all objects are one- or two- particle GFs in imaginary time and real-space.
I split up the loop into two and created temporary GFs in imaginary time.
This version runs approximately 8 times as fast as the previous one and I tested this for different numbers of r-points.
Why is there such a big difference in computations time between those two versions by simply using temporaries?
Btw, I am using TRIQS with hash 331dd46.
The text was updated successfully, but these errors were encountered: