-
Notifications
You must be signed in to change notification settings - Fork 0
/
CHANGELOG
109 lines (73 loc) · 4.29 KB
/
CHANGELOG
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
version 3.0:
- Fixed several bugs.
- CUDA PRNG has been dropped. Now trueke uses a custom GPU-version of Melissa Oneill's PCG which is very fasy and has good quality.
version 2.9:
- Fixed several bugs.
- Code redesign for multi-GPU has been complete.
version 2.8:
- new feature: adaptative temperatures
- the code is under a re-design process, aimed to work better in multi-GPU environments, both workstation and cluster.
- fixed a bug where the random number generators were not unique among replicas.
- minor improvements to the code
version 2.7:
- specific heat and susceptibility are now computed at the end of all simulation, their cost is now O(1) instead of O(rea).
- The corresponding error bars for C, X and Binder are computed properly for the new scheme.
version 2.6:
- the measure zone now can have a period of steps to skip between each measurement.
- fixed a bug where exchange rate was showing less that what actually was.
version 2.5:
- the exchange process now alternates between odd and even replicas.
- the measure zone is no longer sqrt(ptsteps), now is in all ptsteps.
version 2.4:
- went back to adaptation using T, worked better than Beta.
- fixed an error where error bars for observables were not saved correctly in the output file.
version 2.3:
- adaptation now uses Beta array instead of T array.
version 2.2:
- fixed a critical problem that made swapping to be incorrectly done, caused by the changes needed for dynamic dt.
version 2.1:
- fixed bug that caused T1 = T2 for dynamic temperature adaptation.
- added independant parameters for adaptation. We introduce the "-a" option for adaptation.
version 2.0:
- version 2.0 brings a new feature, dynamic dt adaptation as a pre-processing method. This strategy improves the exchange ratio between replicas.
- added dynamic temperature adaptation.
version 1.10:
- Fixed a problem where correlation length was 'NaN' for temperatures above Tc, specially when using a high number of realizations.
version 1.9:
- Fixed a possible race condition when accumulating mcmc statistics by reading in replica order and writeing in temp order using a index maps.
- Fixed a critical bug in the replica swap function.
- Added a missing cudaDeviceSynchronize() for safety reasons.
version 1.8:
- Correlation length is computed using the realization SQM and realization F
version 1.7:
- Improved the quality of the results for specific heat (C), susceptibility (X) and binderQF curves.
- Fixed a bug where realization statistical error bars were incorrect for some observales.
- Removed block error bars for specific heat (C), susceptibility (X) and binderQF curves.
version 1.6:
- Multi-GPU working properly.
- Fixed a large number of bugs.
- Temperature swap
version 1.5:
- The code of the program has been re-structured and commented so it is more efficient and easier to read.
- Multi-GPU prepared, still not working properly.
- Measurement reductions streamed on GPU for better concurrent performance.
version 1.4:
- We now compute two versions of the binder factor. (1) averaged from its formula and (2) formula evaluation using the mean of <sqM>, <quadM> values.
version 1.3:
- fixed a bug where simulation for h != 0 was giving always bad results.
- cleaned the code just a little, there is much to do regarding this aspect.
version 1.2:
- this new version provides up to 4X, 14X and 90X speedup for L=32, 64 and 128, respectively, when compared to version 1.0.
- all observables are computed on GPU by separate reduction kernels.
- metropolis simulation no longer includes partial reductions, they are done deparately.
- the measurement steps are now done on sqrt(ptsteps) steps.
- random numbers are much faster and initialized only once per realization.
version 1.1:
- metropolis process now uses the double checkerboard technique in GPU shared memory.
- observables still computed on CPU.
- on the exchange phase, only pointers are exchanged instead of transferring the whole array spin by spin.
- the overall performance has increased at least by 30% on the smallest lattices, L=32.
- overall performance should be higher for L > 32.
- we can confirm that one of the main bottlenecks is the measurement of observables. Next version will focus on this performance bottleneck.
version 1.0:
- first release.