-
Notifications
You must be signed in to change notification settings - Fork 11
/
Copy pathSPLASH2.POSTING
124 lines (98 loc) · 4.89 KB
/
SPLASH2.POSTING
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
We are pleased to announce the release of the SPLASH-2 suite of
multiprocessor applications. SPLASH-2 is the successor to the SPLASH
suite that we previously released, and the programs in it are also
written assuming a coherent shared address space communication model.
SPLASH-2 contains several new applications, as well as improved versions
of applications from SPLASH. The suite is currently available via
anonymous ftp to
www-flash.stanford.edu (in the pub/splash2 subdirectory)
and via the World-Wide-Web at
http://www-flash.stanford.edu/apps/SPLASH/
Several programs are currently available, and a few others will be added
shortly. The programs fall into two categories: full applications and
kernels. Additionally, we designate some of these as "core programs"
(see below). The applications and kernels currently available in the
SPLASH-2 suite include:
Applications:
Ocean Simulation
Ray Tracer
Hierarchical Radiosity
Volume Renderer
Water Simulation with Spatial Data Structure
Water Simulation without Spatial Data Structure
Barnes-Hut (gravitational N-body simulation)
Adaptive Fast Multipole (gravitational N-body simulation)
Kernels:
FFT
Blocked LU Decomposition
Blocked Sparse Cholesky Factorization
Radix Sort
Programs that will appear soon include:
PSIM4 - Particle Dynamics Simulation (full application)
Conjugate Gradient (kernel)
LocusRoute (standard cell router from SPLASH)
Protein Structure Prediction
Protein Sequencing
Parallel Probabilistic Inference
In some cases, we provide both well-optimized and less-optimized versions
of the programs. For both the Ocean simulation and the Blocked LU
Decomposition kernel, less optimized versions of the codes are currently
available.
There are important differences between applications in the SPLASH-2 suite
and applications in the SPLASH suite. These differences are noted in the
README.SPLASH2 file in the pub/splash2 directory. It is *VERY IMPORTANT*
that you read the README.SPLASH2 file, as well as the individual README
files in the program directories, before using the SPLASH-2 programs.
These files describe how to run the programs, provide commented annotations
about how to distribute data on a machine with physically distributed main
memory, and provides guidelines on the baseline problem sizes to use when
studying architectural interactions through simulation.
Complete documentation of SPLASH2, including a detailed characterization
of performance as well as memory system interactions and synchronization
behavior, will appear in the SPLASH2 report that is currently being
written.
OPTIMIZATION STRATEGY:
----------------------
For each application and kernel, we note potential features or
enhancements that are typically machine-specific. These potential
enhancements are encapsulated within comments in the code starting with
the string "POSSIBLE ENHANCEMENT." The potential enhancements which we
identify are:
(1) Data Distribution
We note where data migration routines should be called in order to
enhance locality of data access. We do not distribute data by
default as different machines implement migration routines in
different ways, and on some machines this is not relevant.
(2) Process-to-Processor Assignment
We note where calls can be made to "pin" processes to specific
processors so that process migration can be avoided. We do not
do this by default, since different machines implement this
feature in different ways.
In addition, to facilitate simulation studies, we note points in the
codes where statistics gathering routines should be turned on so that
cold-start and initialization effects can be avoided.
For two programs (Ocean and LU), we provide less-optimized versions of
the codes. The less-optimized versions utilize data structures that
lead to simpler implementations, but which do not allow for optimal data
distribution (and can generate false-sharing).
CORE PROGRAMS:
--------------
Since the number of programs has increased over SPLASH, and since not
everyone may be able to use all the programs in a given study, we
identify some of the programs as "core" programs that should be used
in most studies for comparability. In the currently available set,
these core programs include:
(1) Ocean Simulation
(2) Hierarchical Radiosity
(3) Water Simulation with Spatial data structure
(4) Barnes-Hut
(5) FFT
(6) Blocked Sparse Cholesky Factorization
(7) Radix Sort
The less optimized versions of the programs, when available, should be
used only in addition to these.
The base problem sizes that we recommend are provided in the README files
for individual applications. Please use at least these for experiments
with upto 64 processors. If changes are made to these base parameters
for further experimentation, these changes should be explicitly stated
in any results that are presented.