forked from hpc/rma-mt
-
Notifications
You must be signed in to change notification settings - Fork 0
/
README
99 lines (70 loc) · 3.16 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
#
# Copyright (c) 2018 Los Alamos National Security, LLC. All rights
# reserved.
# $COPYRIGHT$
#
# Additional copyrights may follow
#
# $HEADER$
#
The RMA-MT benchmarks are designed to evaluate the performance of an
implementation of the Message Passing Interface (MPI) Remote Memory
Access (RMA) interface. There are three tests in this benchmarking
suite:
rmamt_bw:
This benchmark tests the bandwidth of the MPI implementations when
subjected to many concurrent put or get operations. It prints out
the effective bandwidth and the message rate for each message size.
rmamt_bibw:
This benchmark tests the bi-directional bandwidth of the MPI
implementations when subjected to many concurrent put or get
operations from both MPI processes. It prints out the effective
bi-directional bandwidth and the message rates for each message
size.
rmamt_lat:
This benchmarks tests the multi-threaded latency of the MPI
implementation. This is evaluated by summing the total time for each
thread to do a single put or get operation plus the time for
synchronization. It prints out the total latency for each message
size.
Common options for all benchmarks:
-o,--operation=value Operation to benchmark: put, get
-s,--sync=value Synchronization function to use: lock_all,
fence, lock, flush, pscw
-m, --max-size=value Maximum aggregate transfer size
-l, --min-size=value Minimum aggregate transfer size
-w,--win-per-thread Use a different MPI window in each thread
-i,--iterations=value Number of iterations to run for each per
message size (default: 1000)
-t,--threads=value Number of threads to use
-n,--busy-loop Run busy loop on receiver
-x,--bind-threads Bind threads to unique cores (requires
hwloc)
-z,--sleep-interval=value Sleep interval in ns to use on receiver if
using busy loop. Not relevant to rmamt_bibw
loop (default: 10000)
-r,--result=value Write parsable output to the specified file
in CSV format.
-h,--help Print this help message
Configuring and building:
If the desired MPI compiler wrapper is in your path (mpicc, cray cc)
you can simply run:
./configure
make
You can also specify the desired wrapper by setting the CC environment
variable.
Running:
The RMA-MT benchmark suite comes with some basic run scripts in the
scripts directory. These can be modified to run the entire
benchmarking suite in the current allocation using the mpirun
launcher. They can be modified to use any launcher. Note that by
default the scripts bind the bechmark to the socket (unless -x is
specified to bind the worker threads).
The benchmarks can also be run by hand. The minimum arguments required
are -o and -s to specify the operation (put, get) and the
synchronization method (flush, lock, lock_all, pscw). There must be
exactly two processes.
Ex:
mpirun -n 2 --npernode 1 rmamt_bw -o put -s flush -t 32
This will run the bandwidth benchmark with MPI_Put() and
MPI_Win_flush() with 32 threads.