Skip to content

Conversation

@dzzz2001
Copy link
Collaborator

Background

In some test cases, the cal_DMR and contributeHk functions can be hotspots, especially when a large number of OpenMP threads are enabled with fewer MPI processes, as is the case in this Mo16 example:
Mo16.tar.gz.
The discontinuity in memory of the small matrices corresponding to each atom pair in the DMK matrix is one of the reasons for the time consumption in cal_DMR and contributeHk. This PR modifies the algorithms of cal_DMR and foldingHR by first using a temporary contiguous memory to store the DMK matrix, and then performing the conversion between the DMK matrix and the DMR matrix. This approach has achieved significant speedup in some test cases.

Perf Comparison

Below is the time comparison for testing Mo16 on my workstation. The left value is the time before the modification, and the right value is the time after the modification:

OverlapNew contributeHk(s) OperatorLCAO contributeHk(s) DensityMatrix cal_DMR(s)
OMP=1 mpi=2 12.77/5.84 22.28/8.67 65.19/9.68
OMP=2 mpi=2 9.33/3.57 11.36/4.99 32.45/4.95
OMP=4 mpi=2 6.01/3.94 7.42/5.33 23.29/2.69
OMP=1 mpi=4 5.81/3.82 9.77/5.67 30.85/5.11
OMP=2 mpi=4 4.67/2.29 6.75/3.50 17.64/2.65
OMP=4 mpi=4 3.01/2.57 4.83/3.49 13.40/1.42

@dzzz2001 dzzz2001 removed the request for review from Chentao168 March 27, 2025 12:33
@mohanchen mohanchen added Features Needed The features are indeed needed, and developers should have sophisticated knowledge Refactor Refactor ABACUS codes labels Mar 27, 2025
@mohanchen mohanchen merged commit c12bd52 into deepmodeling:develop Mar 28, 2025
14 checks passed
@dzzz2001 dzzz2001 deleted the dmr branch March 28, 2025 07:33
dyzheng pushed a commit to dyzheng/abacus-develop that referenced this pull request Mar 28, 2025
* modify variable name

* modify variable name

* change pointer to ptr

* modify variable name

* modify some variable names

* move functions from .cpp to .h

* optimize cal_DMR

* add schedule(dynamic)

* optimize func_folding
dyzheng added a commit that referenced this pull request Mar 28, 2025
* Fix: stress error with Dojo pseudopotential and LIBXC

* Fix: nspin2/4 mismatch with nspin1 with PBE

* Fix: add test case to CI

* Fix: delete useless warning of write_dmr

* Fix: DFTU output format

* Fix: error of noncolin and autoset mag

* Fix: reference of noncolin

* Revert "Fix: nspin2/4 mismatch with nspin1 with PBE"

This reverts commit ffd91ff.

* Perf: optimize the stream strategy in module_gint (#5845)

* optimize stream strategy

* limit max threads

* Fix: modify orb info manually (#5853)

* Fix: parse_expression for scientific notation (#5882)

* Fix: parse_expression for scientific notation

* modify openmp strategy (#5898)

* Fix document description for ocp and ocp_set (#5896)

* Fix: Resolve compilation issue with Libxc 7.0.0 in ABACUS (#5905)

* Fix: Resolve compilation issue with Libxc 7.0.0 in ABACUS

* Fix: Resolve compilation issue with Libxc 7.0.0 in ABACUS: fix a minor test issue (304_NO_GO_AF_atommag)

* Fix  a bug and a magic number in module_exx_symmetry (#5848)

* fix a magic number in get_euler_angle

* do not allow higher symmetry of bvk supercell than the original cell

* Docs: update docs about init_wfc (#5912)

* Fix the wrong symmetry analysis at nspin=2 (#5926)

* analyze magnetic group without time-reversal symmetry

* fix: need to calculate direct coordinates again

* fix a bug about hcontainer in exx nscf (#5927)

* fix cmake bug (#5929)

* inline function of complexarray (#5964)

* modify doc (#5965)

* Fix segmentation fault in integrate test 312_NO_GO_wfc_get_wf (#5970)

* Doc: polish Quick Start part of online doc (#6006)

* polish Quick Start in online doc

* set scf_thr 1e-6

* correct typo

* test: fix Dockerfile.intel (#5999)

Co-authored-by: root <pxlxingliang>

* fix the format (#6008)

* Fix : out_mat_dh will lead to different result with MPI-1core with MPI-4core (#6018)

* Fix: Enhance the warning message when the XC name cannot be recognized. (#6025)

* Update latest Intel oneAPI default compiler for cxx (#6035)

* Update latest Intel oneAPI default compiler for cxx

* Update elpa version to newest in demo cmake script

* Fix: Angular momentum quantum number check in reading SOC pseudopot file (#6027)

* Fix the angular momentum quantum number check in reading SOC pseudopot file

* Fix related unit test problem and add an SOC pseudopot file

* Refactor SOC check logic for improved readability

* Feature: support the `default` as the value of `dft_functional` when initialize vdw (#5949)

* Feature: support the `default` as the value of `dft_functional` when initialize vdw

* Refactor a littble bit

* Optimize: Compilation time of vdwd3_autoset_xcparam.cpp (#6042)

The compilation time of the vdwd3_autoset_xcparam.cpp file is reduced from 250 seconds to just 5 seconds in my machine.
Thanks to the suggestion from DeepSeek: replacing dynamic initialization with a static array for constructing the std::map

* directly enter exx loop when init_wfc=file (#6019)

* Perf: openmp for cal_force_stress (#5956)

* remove wrong timer

* omp for cal_force_stress

* openmp for cal_force_stress in dftu

* openmp for cal_force_stress in dspin

* little change

* fix bug

* fix a bug

* Fix: DFT+U force&stress with  of some elements are -1 (#6049)

Co-authored-by: dyzheng <zhengdy@bjaisi.com>

* Fix: add the print header for `cusolvermp` in scf info (#6038)

* fix an output for debug (#6066)

* Perf: optimize cal_DMR and folding_HR (#6068)

* modify variable name

* modify variable name

* change pointer to ptr

* modify variable name

* modify some variable names

* move functions from .cpp to .h

* optimize cal_DMR

* add schedule(dynamic)

* optimize func_folding

* add a check before calculating EXX force (#6067)

* fixing issue #5961 (#6071)

* modify warning output (#6074)

* Version: 3.10.0

---------

Co-authored-by: dzzz2001 <153698752+dzzz2001@users.noreply.github.com>
Co-authored-by: Yu Liu <77716030+YuLiu98@users.noreply.github.com>
Co-authored-by: jiyuyang <1041176461@qq.com>
Co-authored-by: Taoni Bao <baotaoni@pku.edu.cn>
Co-authored-by: Qianrui Liu <76200646+Qianruipku@users.noreply.github.com>
Co-authored-by: LUNASEA <33978601+maki49@users.noreply.github.com>
Co-authored-by: wqzhou <33364058+WHUweiqingzhou@users.noreply.github.com>
Co-authored-by: Peng Xingliang <91927439+pxlxingliang@users.noreply.github.com>
Co-authored-by: Xinyuan Liang <64718735+xuan112358@users.noreply.github.com>
Co-authored-by: Liang Sun <50293369+sunliang98@users.noreply.github.com>
Co-authored-by: Chen Nuo <49788094+Cstandardlib@users.noreply.github.com>
Co-authored-by: kirk0830 <67682086+kirk0830@users.noreply.github.com>
Co-authored-by: dyzheng <zhengdy@bjaisi.com>
Co-authored-by: Jie Bao <46254902+BariumOxide13716@users.noreply.github.com>
Fisherd99 pushed a commit to Fisherd99/abacus-BSE that referenced this pull request Mar 31, 2025
* modify variable name

* modify variable name

* change pointer to ptr

* modify variable name

* modify some variable names

* move functions from .cpp to .h

* optimize cal_DMR

* add schedule(dynamic)

* optimize func_folding
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Features Needed The features are indeed needed, and developers should have sophisticated knowledge Refactor Refactor ABACUS codes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants