converged-computing · vsoch · Oct 23, 2023 · Oct 22, 2023 · Oct 22, 2023
diff --git a/docs/_static/data/addons.json b/docs/_static/data/addons.json
@@ -19,6 +19,11 @@
   "description": "performance tools for measurement and analysis",
   "family": "performance"
  },
+ {
+  "name": "perf-mpitrace",
+  "description": "library for measuring communication in distributed-memory parallel applications that use MPI",
+  "family": "performance"
+ },
  {
   "name": "volume-cm",
   "description": "config map volume type",

diff --git a/docs/getting_started/addons.md b/docs/getting_started/addons.md
@@ -218,7 +218,7 @@ environments at this point, which is why I didn't add it.
 
 ### perf-hpctoolkit
 
- - *[perf-hpctoolkit](https://github.com/converged-computing/metrics-operator/tree/main/examples/tests/perf-lammps-hpctoolkit)*
+ - *[perf-hpctoolkit](https://github.com/converged-computing/metrics-operator/tree/main/examples/addons/hpctoolkit-lammps)*
 
 This metric provides [HPCToolkit](https://gitlab.com/hpctoolkit/hpctoolkit) for your application to use. This is the first metric of its type
 to use a shared volume approach. Specifically, we:
@@ -266,3 +266,19 @@ There is a brief listing on [this page](https://hpc.llnl.gov/software/developmen
 We recommend that you do not pair hpctoolkit with another metric, primarily because it is customizing the application
 entrypoint. If you add a process-namespace based metric, you likely need to account for the hpcrun command being the
 wrapper to the actual executable.
+
+
+### perf-mpitrace
+
+ - *[perf-mpitrace](https://github.com/converged-computing/metrics-operator/tree/main/examples/addons/perf-mpitrace)*
+
+This metric provides [mpitrace](https://github.com/IBM/mpitrace) to wrap an MPI application. The setup is the same as hpctoolkit, and we
+currently only provide a rocky base (please let us know if you need another). It works by way of wrapping the mpirun command with `LD_PRELOAD`.
+See the link above for an example that uses LAMMPS.
+
+Here are the acceptable parameters.
+
+| Name | Description | Type | Default |
+|-----|-------------|------------|------|
+| mount | Path to mount hpctoolview view in application container | string | /opt/share |
+| image | Customize the container image | string | `ghcr.io/converged-computing/metric-mpitrace:rocky` |
diff --git a/examples/addons/mpitrace-lammps/README.md b/examples/addons/mpitrace-lammps/README.md
@@ -0,0 +1,49 @@
+# LAMMPS Example
+
+This is an example of a metric app, lammps, which is part of the [coral 2 benchmarks](https://asc.llnl.gov/coral-2-benchmarks) and technically
+isn't a metric, but we use it often to assess time and thus the MPI latency. A Python example (parsing the output data)
+is provided in [python/app-lammps](../../python/app-lammps).
+
+## Usage
+
+Create a cluster and install JobSet to it.
+
+```bash
+kind create cluster
+VERSION=v0.2.0
+kubectl apply --server-side -f https://github.com/kubernetes-sigs/jobset/releases/download/$VERSION/manifests.yaml
+```
+
+Install the operator (from the development manifest here):
+
+```bash
+kubectl apply -f ../../dist/metrics-operator-dev.yaml
+```
+
+How to see metrics operator logs:
+
+```bash
+$ kubectl logs -n metrics-system metrics-controller-manager-859c66464c-7rpbw
+```
+
+Then create the metrics set. This is going to run a single run of LAMMPS over MPI!
+as lammps runs.
+
+```bash
+kubectl apply -f metrics-rocky.yaml
+```
+
+Wait until you see pods created by the job and then running.
+
+```bash
+kubectl get pods
+```
+
+And then you can shell in and look at the output, which should be named with the pattern `mpi_profile.<proc>.<rank>`.
+I use kubectl copy to copy examples to the present working directory here.
+
+When you are done, cleanup.
+
+```bash
+kubectl delete -f metrics.yaml
+```
diff --git a/examples/addons/mpitrace-lammps/metrics-rocky.yaml b/examples/addons/mpitrace-lammps/metrics-rocky.yaml
@@ -0,0 +1,32 @@
+apiVersion: flux-framework.org/v1alpha2
+kind: MetricSet
+metadata:
+  labels:
+    app.kubernetes.io/name: metricset
+    app.kubernetes.io/instance: metricset-sample
+  name: metricset-sample
+spec:
+  # Number of pods for lammps (one launcher, the rest workers)
+  pods: 4
+  logging:
+    interactive: true
+
+  metrics:
+
+   # Running more scaled lammps is our main goal
+   - name: app-lammps
+
+     # This is for if you use rocky, not the default
+     image: ghcr.io/converged-computing/metric-lammps-intel-mpi:rocky
+     options:
+       command: /opt/intel/mpi/2021.8.0/bin/mpirun --hostfile ./hostlist.txt -np 4 --map-by socket lmp -v x 2 -v y 2 -v z 2 -in in.reaxc.hns -nocite
+       workdir: /opt/lammps/examples/reaxff/HNS
+
+     # Add on hpctoolkit, will mount a volume and wrap lammps
+     addons:
+       - name: perf-mpitrace
+         options:
+           mount: /opt/mnt
+           image: ghcr.io/converged-computing/metric-mpitrace:rocky           
+           workdir: /opt/lammps/examples/reaxff/HNS
+           containerTarget: launcher
diff --git a/examples/addons/mpitrace-lammps/mpi_profile.114.0 b/examples/addons/mpitrace-lammps/mpi_profile.114.0
@@ -0,0 +1,104 @@
+Data for MPI rank 0 of 4:
+Times from MPI_Init() to MPI_Finalize().
+-----------------------------------------------------------------------
+MPI Routine                        #calls     avg. bytes      time(sec)
+-----------------------------------------------------------------------
+MPI_Comm_rank                          10            0.0          0.000
+MPI_Comm_size                           4            0.0          0.000
+MPI_Send                            20560         9780.7          0.008
+MPI_Irecv                           20560         9781.5          0.003
+MPI_Sendrecv                           36            8.0          0.001
+MPI_Wait                            20560            0.0          7.251
+MPI_Bcast                             129         1044.8          0.001
+MPI_Barrier                             7            0.0          0.003
+MPI_Reduce                              4            7.0          0.000
+MPI_Allreduce                        5167            8.1          0.012
+MPI_Allgather                           1            4.0          0.000
+MPI_Allgatherv                          1         7392.0          0.000
+-----------------------------------------------------------------------
+MPI task 0 of 4 had the maximum communication time.
+total communication time = 7.279 seconds.
+total elapsed time       = 14.510 seconds.
+user cpu time            = 12.155 seconds.
+system time              = 2.337 seconds.
+max resident set size    = 137.273 MiB.
+
+-----------------------------------------------------------------
+Message size distributions:
+
+MPI_Send                  #calls    avg. bytes      time(sec)
+                               7           0.0          0.000
+                               3         176.0          0.000
+                               1         352.0          0.000
+                               1         528.0          0.000
+                            9870        2529.2          0.002
+                             392        7584.8          0.000
+                            9870       15538.2          0.004
+                              12       17714.7          0.000
+                             392       46623.9          0.001
+                              12      108532.7          0.000
+
+MPI_Irecv                 #calls    avg. bytes      time(sec)
+                               8           0.0          0.000
+                               1         176.0          0.000
+                               2         352.0          0.000
+                               1         528.0          0.000
+                            9870        2529.1          0.001
+                             392        7585.4          0.000
+                            9870       15539.6          0.001
+                              12       17668.0          0.000
+                             392       46617.9          0.000
+                              12      108990.0          0.000
+
+MPI_Sendrecv              #calls    avg. bytes      time(sec)
+                              36           8.0          0.001
+
+MPI_Bcast                 #calls    avg. bytes      time(sec)
+                              80           3.4          0.000
+                               3           7.7          0.000
+                               4          13.2          0.000
+                              20          24.5          0.001
+                              12          49.0          0.000
+                               2          96.0          0.000
+                               1         312.0          0.000
+                               1         992.0          0.000
+                               1        2048.0          0.000
+                               1        3840.0          0.000
+                               3       24239.7          0.000
+                               1       53248.0          0.000
+
+MPI_Reduce                #calls    avg. bytes      time(sec)
+                               1           4.0          0.000
+                               3           8.0          0.000
+
+MPI_Allreduce             #calls    avg. bytes      time(sec)
+                              14           4.0          0.000
+                            5125           8.0          0.012
+                              13          15.4          0.000
+                              12          24.0          0.000
+                               3          40.0          0.000
+
+MPI_Allgather             #calls    avg. bytes      time(sec)
+                               1           4.0          0.000
+
+MPI_Allgatherv            #calls    avg. bytes      time(sec)
+                               1        7392.0          0.000
+
+-----------------------------------------------------------------
+
+Summary for all tasks:
+
+  Rank 0 reported the largest memory utilization : 137.27 MiB
+  Rank 2 reported the largest elapsed time : 14.51 sec
+
+  minimum communication time = 0.085 sec for task 2
+  median  communication time = 1.633 sec for task 1
+  maximum communication time = 7.279 sec for task 0
+
+
+MPI timing summary for all ranks:
+taskid                     host    cpu    comm(s)  elapsed(s)     user(s)   system(s)   size(MiB)    switches
+     0   metricset-sample-l-0-0      1       7.28       14.51       12.16        2.34      137.27         842
+     1   metricset-sample-l-0-0      5       1.63       14.51       13.99        0.51      129.07          63
+     2   metricset-sample-l-0-0      7       0.08       14.51       14.45        0.05      129.86          43
+     3   metricset-sample-l-0-0     10       2.70       14.51       13.73        0.78      131.18          39
diff --git a/examples/addons/mpitrace-lammps/mpi_profile.114.1 b/examples/addons/mpitrace-lammps/mpi_profile.114.1
@@ -0,0 +1,86 @@
+Data for MPI rank 1 of 4:
+Times from MPI_Init() to MPI_Finalize().
+-----------------------------------------------------------------------
+MPI Routine                        #calls     avg. bytes      time(sec)
+-----------------------------------------------------------------------
+MPI_Comm_rank                          10            0.0          0.000
+MPI_Comm_size                           4            0.0          0.000
+MPI_Send                            20560         9788.2          0.017
+MPI_Irecv                           20560         9787.4          0.004
+MPI_Sendrecv                           36            8.0          0.000
+MPI_Wait                            20560            0.0          1.601
+MPI_Bcast                             129         1044.8          0.001
+MPI_Barrier                             7            0.0          0.000
+MPI_Reduce                              4            7.0          0.000
+MPI_Allreduce                        5167            8.1          0.009
+MPI_Allgather                           1            4.0          0.000
+MPI_Allgatherv                          1         7296.0          0.000
+-----------------------------------------------------------------------
+MPI task 1 of 4 had the median communication time.
+total communication time = 1.633 seconds.
+total elapsed time       = 14.510 seconds.
+user cpu time            = 13.993 seconds.
+system time              = 0.508 seconds.
+max resident set size    = 129.074 MiB.
+
+-----------------------------------------------------------------
+Message size distributions:
+
+MPI_Send                  #calls    avg. bytes      time(sec)
+                               8           0.0          0.000
+                               1         176.0          0.000
+                               2         352.0          0.000
+                               1         528.0          0.000
+                            9870        2541.4          0.003
+                             392        7626.1          0.000
+                            9870       15539.6          0.011
+                              12       17794.0          0.000
+                             392       46617.9          0.002
+                              12      108990.0          0.000
+
+MPI_Irecv                 #calls    avg. bytes      time(sec)
+                               7           0.0          0.000
+                               3         176.0          0.000
+                               1         352.0          0.000
+                               1         528.0          0.000
+                            9870        2541.4          0.002
+                             392        7626.0          0.000
+                            9870       15538.2          0.002
+                              12       17803.3          0.000
+                             392       46623.9          0.000
+                              12      108532.7          0.000
+
+MPI_Sendrecv              #calls    avg. bytes      time(sec)
+                              36           8.0          0.000
+
+MPI_Bcast                 #calls    avg. bytes      time(sec)
+                              80           3.4          0.000
+                               3           7.7          0.000
+                               4          13.2          0.000
+                              20          24.5          0.000
+                              12          49.0          0.000
+                               2          96.0          0.000
+                               1         312.0          0.000
+                               1         992.0          0.000
+                               1        2048.0          0.000
+                               1        3840.0          0.000
+                               3       24239.7          0.000
+                               1       53248.0          0.000
+
+MPI_Reduce                #calls    avg. bytes      time(sec)
+                               1           4.0          0.000
+                               3           8.0          0.000
+
+MPI_Allreduce             #calls    avg. bytes      time(sec)
+                              14           4.0          0.000
+                            5125           8.0          0.009
+                              13          15.4          0.000
+                              12          24.0          0.000
+                               3          40.0          0.000
+
+MPI_Allgather             #calls    avg. bytes      time(sec)
+                               1           4.0          0.000
+
+MPI_Allgatherv            #calls    avg. bytes      time(sec)
+                               1        7296.0          0.000
+