Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add mpitrace #74

Merged
merged 2 commits into from
Oct 23, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions docs/_static/data/addons.json
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,11 @@
"description": "performance tools for measurement and analysis",
"family": "performance"
},
{
"name": "perf-mpitrace",
"description": "library for measuring communication in distributed-memory parallel applications that use MPI",
"family": "performance"
},
{
"name": "volume-cm",
"description": "config map volume type",
Expand Down
18 changes: 17 additions & 1 deletion docs/getting_started/addons.md
Original file line number Diff line number Diff line change
Expand Up @@ -218,7 +218,7 @@ environments at this point, which is why I didn't add it.

### perf-hpctoolkit

- *[perf-hpctoolkit](https://github.com/converged-computing/metrics-operator/tree/main/examples/tests/perf-lammps-hpctoolkit)*
- *[perf-hpctoolkit](https://github.com/converged-computing/metrics-operator/tree/main/examples/addons/hpctoolkit-lammps)*

This metric provides [HPCToolkit](https://gitlab.com/hpctoolkit/hpctoolkit) for your application to use. This is the first metric of its type
to use a shared volume approach. Specifically, we:
Expand Down Expand Up @@ -266,3 +266,19 @@ There is a brief listing on [this page](https://hpc.llnl.gov/software/developmen
We recommend that you do not pair hpctoolkit with another metric, primarily because it is customizing the application
entrypoint. If you add a process-namespace based metric, you likely need to account for the hpcrun command being the
wrapper to the actual executable.


### perf-mpitrace

- *[perf-mpitrace](https://github.com/converged-computing/metrics-operator/tree/main/examples/addons/perf-mpitrace)*

This metric provides [mpitrace](https://github.com/IBM/mpitrace) to wrap an MPI application. The setup is the same as hpctoolkit, and we
currently only provide a rocky base (please let us know if you need another). It works by way of wrapping the mpirun command with `LD_PRELOAD`.
See the link above for an example that uses LAMMPS.

Here are the acceptable parameters.

| Name | Description | Type | Default |
|-----|-------------|------------|------|
| mount | Path to mount hpctoolview view in application container | string | /opt/share |
| image | Customize the container image | string | `ghcr.io/converged-computing/metric-mpitrace:rocky` |
49 changes: 49 additions & 0 deletions examples/addons/mpitrace-lammps/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
# LAMMPS Example

This is an example of a metric app, lammps, which is part of the [coral 2 benchmarks](https://asc.llnl.gov/coral-2-benchmarks) and technically
isn't a metric, but we use it often to assess time and thus the MPI latency. A Python example (parsing the output data)
is provided in [python/app-lammps](../../python/app-lammps).

## Usage

Create a cluster and install JobSet to it.

```bash
kind create cluster
VERSION=v0.2.0
kubectl apply --server-side -f https://github.com/kubernetes-sigs/jobset/releases/download/$VERSION/manifests.yaml
```

Install the operator (from the development manifest here):

```bash
kubectl apply -f ../../dist/metrics-operator-dev.yaml
```

How to see metrics operator logs:

```bash
$ kubectl logs -n metrics-system metrics-controller-manager-859c66464c-7rpbw
```

Then create the metrics set. This is going to run a single run of LAMMPS over MPI!
as lammps runs.

```bash
kubectl apply -f metrics-rocky.yaml
```

Wait until you see pods created by the job and then running.

```bash
kubectl get pods
```

And then you can shell in and look at the output, which should be named with the pattern `mpi_profile.<proc>.<rank>`.
I use kubectl copy to copy examples to the present working directory here.

When you are done, cleanup.

```bash
kubectl delete -f metrics.yaml
```
32 changes: 32 additions & 0 deletions examples/addons/mpitrace-lammps/metrics-rocky.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
apiVersion: flux-framework.org/v1alpha2
kind: MetricSet
metadata:
labels:
app.kubernetes.io/name: metricset
app.kubernetes.io/instance: metricset-sample
name: metricset-sample
spec:
# Number of pods for lammps (one launcher, the rest workers)
pods: 4
logging:
interactive: true

metrics:

# Running more scaled lammps is our main goal
- name: app-lammps

# This is for if you use rocky, not the default
image: ghcr.io/converged-computing/metric-lammps-intel-mpi:rocky
options:
command: /opt/intel/mpi/2021.8.0/bin/mpirun --hostfile ./hostlist.txt -np 4 --map-by socket lmp -v x 2 -v y 2 -v z 2 -in in.reaxc.hns -nocite
workdir: /opt/lammps/examples/reaxff/HNS

# Add on hpctoolkit, will mount a volume and wrap lammps
addons:
- name: perf-mpitrace
options:
mount: /opt/mnt
image: ghcr.io/converged-computing/metric-mpitrace:rocky
workdir: /opt/lammps/examples/reaxff/HNS
containerTarget: launcher
104 changes: 104 additions & 0 deletions examples/addons/mpitrace-lammps/mpi_profile.114.0
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
Data for MPI rank 0 of 4:
Times from MPI_Init() to MPI_Finalize().
-----------------------------------------------------------------------
MPI Routine #calls avg. bytes time(sec)
-----------------------------------------------------------------------
MPI_Comm_rank 10 0.0 0.000
MPI_Comm_size 4 0.0 0.000
MPI_Send 20560 9780.7 0.008
MPI_Irecv 20560 9781.5 0.003
MPI_Sendrecv 36 8.0 0.001
MPI_Wait 20560 0.0 7.251
MPI_Bcast 129 1044.8 0.001
MPI_Barrier 7 0.0 0.003
MPI_Reduce 4 7.0 0.000
MPI_Allreduce 5167 8.1 0.012
MPI_Allgather 1 4.0 0.000
MPI_Allgatherv 1 7392.0 0.000
-----------------------------------------------------------------------
MPI task 0 of 4 had the maximum communication time.
total communication time = 7.279 seconds.
total elapsed time = 14.510 seconds.
user cpu time = 12.155 seconds.
system time = 2.337 seconds.
max resident set size = 137.273 MiB.

-----------------------------------------------------------------
Message size distributions:

MPI_Send #calls avg. bytes time(sec)
7 0.0 0.000
3 176.0 0.000
1 352.0 0.000
1 528.0 0.000
9870 2529.2 0.002
392 7584.8 0.000
9870 15538.2 0.004
12 17714.7 0.000
392 46623.9 0.001
12 108532.7 0.000

MPI_Irecv #calls avg. bytes time(sec)
8 0.0 0.000
1 176.0 0.000
2 352.0 0.000
1 528.0 0.000
9870 2529.1 0.001
392 7585.4 0.000
9870 15539.6 0.001
12 17668.0 0.000
392 46617.9 0.000
12 108990.0 0.000

MPI_Sendrecv #calls avg. bytes time(sec)
36 8.0 0.001

MPI_Bcast #calls avg. bytes time(sec)
80 3.4 0.000
3 7.7 0.000
4 13.2 0.000
20 24.5 0.001
12 49.0 0.000
2 96.0 0.000
1 312.0 0.000
1 992.0 0.000
1 2048.0 0.000
1 3840.0 0.000
3 24239.7 0.000
1 53248.0 0.000

MPI_Reduce #calls avg. bytes time(sec)
1 4.0 0.000
3 8.0 0.000

MPI_Allreduce #calls avg. bytes time(sec)
14 4.0 0.000
5125 8.0 0.012
13 15.4 0.000
12 24.0 0.000
3 40.0 0.000

MPI_Allgather #calls avg. bytes time(sec)
1 4.0 0.000

MPI_Allgatherv #calls avg. bytes time(sec)
1 7392.0 0.000

-----------------------------------------------------------------

Summary for all tasks:

Rank 0 reported the largest memory utilization : 137.27 MiB
Rank 2 reported the largest elapsed time : 14.51 sec

minimum communication time = 0.085 sec for task 2
median communication time = 1.633 sec for task 1
maximum communication time = 7.279 sec for task 0


MPI timing summary for all ranks:
taskid host cpu comm(s) elapsed(s) user(s) system(s) size(MiB) switches
0 metricset-sample-l-0-0 1 7.28 14.51 12.16 2.34 137.27 842
1 metricset-sample-l-0-0 5 1.63 14.51 13.99 0.51 129.07 63
2 metricset-sample-l-0-0 7 0.08 14.51 14.45 0.05 129.86 43
3 metricset-sample-l-0-0 10 2.70 14.51 13.73 0.78 131.18 39
86 changes: 86 additions & 0 deletions examples/addons/mpitrace-lammps/mpi_profile.114.1
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
Data for MPI rank 1 of 4:
Times from MPI_Init() to MPI_Finalize().
-----------------------------------------------------------------------
MPI Routine #calls avg. bytes time(sec)
-----------------------------------------------------------------------
MPI_Comm_rank 10 0.0 0.000
MPI_Comm_size 4 0.0 0.000
MPI_Send 20560 9788.2 0.017
MPI_Irecv 20560 9787.4 0.004
MPI_Sendrecv 36 8.0 0.000
MPI_Wait 20560 0.0 1.601
MPI_Bcast 129 1044.8 0.001
MPI_Barrier 7 0.0 0.000
MPI_Reduce 4 7.0 0.000
MPI_Allreduce 5167 8.1 0.009
MPI_Allgather 1 4.0 0.000
MPI_Allgatherv 1 7296.0 0.000
-----------------------------------------------------------------------
MPI task 1 of 4 had the median communication time.
total communication time = 1.633 seconds.
total elapsed time = 14.510 seconds.
user cpu time = 13.993 seconds.
system time = 0.508 seconds.
max resident set size = 129.074 MiB.

-----------------------------------------------------------------
Message size distributions:

MPI_Send #calls avg. bytes time(sec)
8 0.0 0.000
1 176.0 0.000
2 352.0 0.000
1 528.0 0.000
9870 2541.4 0.003
392 7626.1 0.000
9870 15539.6 0.011
12 17794.0 0.000
392 46617.9 0.002
12 108990.0 0.000

MPI_Irecv #calls avg. bytes time(sec)
7 0.0 0.000
3 176.0 0.000
1 352.0 0.000
1 528.0 0.000
9870 2541.4 0.002
392 7626.0 0.000
9870 15538.2 0.002
12 17803.3 0.000
392 46623.9 0.000
12 108532.7 0.000

MPI_Sendrecv #calls avg. bytes time(sec)
36 8.0 0.000

MPI_Bcast #calls avg. bytes time(sec)
80 3.4 0.000
3 7.7 0.000
4 13.2 0.000
20 24.5 0.000
12 49.0 0.000
2 96.0 0.000
1 312.0 0.000
1 992.0 0.000
1 2048.0 0.000
1 3840.0 0.000
3 24239.7 0.000
1 53248.0 0.000

MPI_Reduce #calls avg. bytes time(sec)
1 4.0 0.000
3 8.0 0.000

MPI_Allreduce #calls avg. bytes time(sec)
14 4.0 0.000
5125 8.0 0.009
13 15.4 0.000
12 24.0 0.000
3 40.0 0.000

MPI_Allgather #calls avg. bytes time(sec)
1 4.0 0.000

MPI_Allgatherv #calls avg. bytes time(sec)
1 7296.0 0.000

Loading