-
Notifications
You must be signed in to change notification settings - Fork 15
/
Copy pathCHANGELOG
executable file
·150 lines (121 loc) · 6.3 KB
/
CHANGELOG
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
Version 3.1.0: Released May 09, 2024
===================================
- Added more examples for Radon transforms
- Added drivers for IFIO with Split Bregman iterations with L1 regularizer
- Improved H matrix parallel performance by adding option%hextralevel
- Added 2D and 3D VIE drivers (matrix and tensor) with voxel discretization and a 3D slowness generator
- Added HSS-BF-MD format with iterative solvers only
- Improved error checking for entry-based construction of hierarchical matirces by using sparse matvec
- Added/Improved BPACK_(MD)_Ztfqmr_usermatvec_noprecon
- Improved Bplus_(MD_)block_MVP_dat communication performance
Version 3.0.0: Released Mar 22, 2024
===================================
- Added butterfly tensor algorithms with additional ZFP compression
- Added rank benchmark examples for matrix and tensor interfaces including 2D/3D Green's function, 2D/3D Radon transforms, and high-dimensional FFT
Version 2.4.0: Released Oct 31, 2023
===================================
- Changed openmpi and scalapack as optional dependency
- Added Fortran interface for compressing a non-square matrix as a HSS-BF or H matrix.
Version 2.3.1: Released Aug 31, 2022
===================================
- Added rank benchmark for low-rank and butterfly compression of free space green's functions
- Added conversion to dense matrix and calling dense eigensolvers
- Improved compiling and calling of the arpack interface
- Added drivers for compressing fractional Laplacian operators
- Added ZFP for dense blocks in the hierarchical matrix formats
- Added more macros guarding OMP semantics
Version 2.2.0: Released Sep 15, 2022
===================================
- Changed 1D layout to 2D layout in the H matrix code
- Added doxygen support
- Updated the EMSURF_Port modules with mode bases
- Added strong admissiblity peeling algorithms
- Added support for periodic kernels
- Added 2D inverse FIO drivers
- Added HSS-BF with strong admissiblity (no inversion yet)
- Added cmake option to switch off OpenMP
Version 2.1.1: Released Mar 04, 2022
===================================
- Added forwardN15flag=2 as a hybrid algorithm
- Added diagonal regularizers for leaflevel blocks
Version 2.1.0: Released Dec 16, 2021
===================================
- Switch from Travis to github/gitlab CI
- Added support for single-precision real and complex
- Added a single-precision driver ie3d_sp
- Fixed the scaling factor in the T operator in EMSURF*.f90
- Added support for arbitary-shaped port in EMSURF_Port_Eigen*.f90
- Rename a few constants to be XSDK-compatible
- Added more memory and CPU printings in the C++ interface
- Improved block extraction performance with reduced numbers of buffers
- Added Perlmutter test scripts
- Added support and scripts for the Cray Fortran compiler
- Added e4s.yaml
Version 2.0.0: Released Sep 01, 2021
===================================
- Improved EM Eigen driver, added interface with GPTune
- Added inverse FIO example
- Added support for high order Newton Schultz iteration for butterfly woodbury
- Added absolute tolerance in RRQR in the peeling algorithm
- Added option%nogeo=4 to specify a list of geometry points together with predefined NNs
- Added option%elem_extract=2 to improve computation of a list of blocks of matrix entries, assuming no MPI communication is performed during the block computation
- Adaptive ACA block size when elem_extract=0
- Added support for advancing multiple ACA together to reduce the number of calls to entry evaluation
- Parallelized the N15 compression algorithm
- Added back deterministic LR arithmetics in the H solver
- Removed option%rmax in most of the compression functions
- Additional Fix for GNU SED issue on MAC OS
- Added an option to switch between "use MPI" and "INCLUDE 'mpif.h'"
- Fixed a NAN bug in newton schultz
- Fixed an accuracy bug in LR_BACA*
- Fixed an crash bug in BPACK_Randomized
- Added gesdd when gesvd fails.
- Added knn_near_para as an option to ensure better knn search quality
Version 1.2.1: Released Oct 12, 2020
===================================
- Minor fix for building with gnu10
Version 1.2.0: Released Aug 14, 2020
===================================
- Added the hss-bf format
- Added parallel SVD-based inverse
- Improved interface of nogeo=3
- Added an option less_adapt for randomized schemes involving non-uniform rank distributions
- Fixed flop counts in a few places
- Added small pivot replacement in pgetrff90
- Added more example drivers
- Added a parameter sample_para_outer as the sample_para for the outermost butterfly levels
- Fix the intel compiling issue by changing C_LOC to LOC
- Improved butterfly matvec performance
- Added mkl batched gemm in butterfly matvec
- Improved update time in HOD-BF
- Added parallel singular value sweeps for the inner BF factors
- Use smaller tolerence for hss update operation; getting ready for parallel single value sweep
- Improved BF_all2all* and BPACK_all2all*
- Improved threading performance in BF construction; removed sample_heuristic; added element_Zmn_blocklist_user
- Improved parallelism after BF_Split; fixed a few memory leaks
Version 1.1.0: Released Nov 07, 2019
===================================
- Added new interface (option%nogeo=3) for passing NNs
- Disabled intel VSL due to issues of linking both dbutterflypack and zbutterflypack
- Disabled pgeqrff90 for gcc>=9, as it causes segfault
- Improved creation of communicators and blacs grids
- Added deterministic A-BD^-1C in LR format
- Fixed a deadblock bug in BF_compress_NlogN with non power of 2 processs
- Added an option sample_heuristics to improve entry-evaluation-based compression accuracy
Version 1.0.3: Released Sep 24, 2019
===================================
- Improved C++ interface
- Improved knn search performance
- Fixed the different CMAKE openmp link flag issue for CXX and Fortran linkers
Version 1.0.2: Released Sep 04, 2019
===================================
- Tested functionality on osx
Version 1.0.1: Released Aug 16, 2019
===================================
- Added cmake test for detecting OMP task loop
- Added cmake test for detecting fortran finalizer
- Added the reference parallel ACA algorithm without merge
- Fixed GNU sed issues in cmake
Version 1.0.0: Released Aug 09, 2019
===================================
- The first XSDK-compatible release