Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: add p2p benchmark code #6907

Merged
merged 6 commits into from
Oct 2, 2024
Merged

test: add p2p benchmark code #6907

merged 6 commits into from
Oct 2, 2024

Conversation

hzhou
Copy link
Contributor

@hzhou hzhou commented Feb 12, 2024

Pull Request Description

While OSU micro benchmarks are commonly used, they still require separate builds and its comprehensive inclusions of many options make using and tuning the benchmarks non-trivial. This PR adds simple benchmark code that can be used in CI testing. Each test consists of a single standalone C source code with minimum options, thus allowing quick testing and easy adaptations.

The PR also introduces MyDef template system. MyDef allows the construction of code in layers. However, before getting familiar, developers will likely feel the source code in MyDef to be mysterious. To start, follow the instructions to get the source code in C and read the C code.

  • autogen.sh (in topsrc_dir) will convert .def code into .c, thus removing the dependency on MyDef.
  • If one added MyDef to PATH, one can manually convert the code or use Makefile
[test/mpi/bench]$ mydef_page p2p_latency.def
PAGE: p2p_latency
  --> [./p2p_latency.c]

Or

$ mydef_run p2p_latency.def
PAGE: p2p_latency
  --> [./p2p_latency.c]
mpicc -o ./p2p_latency ./p2p_latency.c -lm && mpirun -n 2 ./p2p_latency
TEST p2p_latency:
     msgsize    latency(us)  sigma(us)    bandwidth(MB/s)
           0      0.465      0.002            0.000
           1      0.475      0.001            2.105
           2      0.476      0.002            4.205
           4      0.478      0.019            8.360
           8      0.476      0.002           16.800
          16      0.475      0.002           33.694
          32      0.474      0.002           67.451
          64      0.476      0.001          134.424
         128      0.505      0.003          253.422
         256      0.523      0.004          489.442
         512      0.608      0.005          842.436
        1024      0.776      0.003         1318.895
        2048      0.932      0.003         2196.811
        4096      1.273      0.004         3216.882
        8192      1.949      0.006         4202.567
       16384      4.143      0.017         3954.333
       32768      5.674      0.016         5775.437
       65536      8.632      0.018         7592.587
      131072     14.502      0.050         9038.108
      262144     26.104      0.052        10042.183
      524288     50.209      0.104        10442.031
     1048576    106.563      0.234         9839.961
     2097152    219.804      0.385         9541.020
     4194304    476.780      1.431         8797.154

  • The usual make works as well
$ make testing
../runtests -srcdir=. -tests=testlist,testlist.gpu,testlist.dtp,testlist.collalgo -testdirs= \
        -mpiexec="/home/hzhou/MPI/bin/mpiexec"  -xmlfile=summary.xml \
        -tapfile=summary.tap -junitfile=summary.junit.xml
Load tests in .
Running tests in . [00:00:00]
TEST p2p_latency:
     msgsize    latency(us)  sigma(us)    bandwidth(MB/s)
           0      0.472      0.007            0.000
           1      0.481      0.001            2.081
           2      0.480      0.001            4.162
           4      0.481      0.002            8.320
           8      0.480      0.001           16.663
          16      0.480      0.001           33.321
          32      0.479      0.001           66.751
          64      0.480      0.001          133.334
         128      0.505      0.001          253.306
         256      0.521      0.002          491.461
         512      0.601      0.003          851.506
        1024      0.775      0.003         1321.005
        2048      0.935      0.004         2190.365
        4096      1.271      0.004         3222.500
        8192      1.942      0.005         4217.288
       16384      4.141      0.018         3956.726
       32768      5.629      0.022         5821.091
       65536      8.561      0.020         7654.838
      131072     14.381      0.030         9114.115
      262144     25.907      0.054        10118.604
      524288     49.848      0.098        10517.631
     1048576    105.883      0.262         9903.189
     2097152    217.566      0.529         9639.174
     4194304    475.046      1.003         8829.267

TEST p2p_bw:
     msgsize    latency(us)  sigma(us)    bandwidth(MB/s)
           1      0.224      0.000            4.458
           2      0.224      0.000            8.918
           4      0.224      0.001           17.854
           8      0.225      0.001           35.634
          16      0.224      0.000           71.417
          32      0.224      0.000          142.606
          64      0.225      0.000          284.557
         128      0.258      0.000          496.326
         256      0.267      0.000          958.630
         512      0.284      0.001         1803.500
        1024      0.346      0.001         2957.940
        2048      0.366      0.001         5602.925
        4096      0.503      0.002         8151.093
        8192      0.783      0.007        10463.363
       16384      1.841      0.031         8897.694
       32768      3.291      0.006         9956.639
       65536      6.063      0.012        10809.210
      131072     11.672      0.325        11229.356
      262144     22.565      0.017        11617.333
      524288     44.575      0.059        11762.012
     1048576     87.345      0.173        12005.058
     2097152    171.658      0.299        12217.021
     4194304    342.196      0.284        12257.009

 All 2 tests passed! (total runtime: 0 min 6 sec)
TAP formatted results in /home/hzhou/work/pull_requests/2311_bench/test/mpi/bench/summary.tap
JUNIT formatted results in /home/hzhou/work/pull_requests/2311_bench/test/mpi/bench/summary.junit.xml

[skip warnings]

Author Checklist

  • Provide Description
    Particularly focus on why, not what. Reference background, issues, test failures, xfail entries, etc.
  • Commits Follow Good Practice
    Commits are self-contained and do not do two things at once.
    Commit message is of the form: module: short description
    Commit message explains what's in the commit.
  • Passes All Tests
    Whitespace checker. Warnings test. Additional tests via comments.
  • Contribution Agreement
    For non-Argonne authors, check contribution agreement.
    If necessary, request an explicit comment from your companies PR approval manager.

MyDef provides general templating facilities.
Add point-to-point benchmark code in MyDef. The tests have automatic
warm-ups and adjusts number of iterations for measurement accuracy.
It produces latency measurements with standard deviations and equivalent
bandwidths.

MYDEF_BOOT=[topsrc_dir]/modules/mydef_boot
export PATH=$MYDEF_BOOT/bin:$PATH
export PERL5LIB=$MYDEF_BOOT/lib/perl5
export MYDEFLIB=$MYDEF_BOOT/lib/MyDef

To run:
    mydef_page p2p_latency.def  # -> p2p_latency.c
    mpicc p2p_latency.c && mpi_run -n 2 ./a.out

Alternatively use mydef_run (uses settings from config):
    mydef_run p2p_latency.def

Next commit will add "make testing".
We could add rules to directly work with mydef code in Makefile, but
convert the code in autogen removes the mydef dependency.

Also fix a spelling error.
This check does not capture output (thus test results will show in
console log) and only checks for exit code - zero means success and
nonzero means failure.

We'll use this check for benchmark tests.
"make testing" in test/mpi/bench should work.
Add device memory support using mtest_common utilities. This will add
the dependency to utility libraries, which the makefile already
imports.

However, this will remove the simpliicity of building single
source with mpicc or mydef_run. If one doesn't need test device memory,
one can simply comment off "$include macros/mtest.def" to restore the
simplicity.
@hzhou
Copy link
Contributor Author

hzhou commented Oct 2, 2024

test:mpich/ch4/ofi

@hzhou hzhou merged commit 1f359fe into pmodels:main Oct 2, 2024
4 of 5 checks passed
@hzhou hzhou deleted the 2311_bench branch October 2, 2024 21:59
@hzhou hzhou mentioned this pull request Oct 2, 2024
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants