Skip to content

WeeklyTelcon_20220823

Geoffrey Paulsen edited this page Aug 23, 2022 · 1 revision

Open MPI Weekly Telecon ---

  • Dialup Info: (Do not post to public mailing list or public wiki)

Attendees (on Web-ex)

  • Austen Lauria (IBM)
  • Brendan Cunningham (Cornelis Networks)
  • Brian Barrett (AWS)
  • Christoph Niethammer (HLRS)
  • Edgar Gabriel (UoH)
  • Geoffrey Paulsen (IBM)
  • Harumi Kuno (HPE)
  • Howard Pritchard (LANL)
  • Josh Fisher (Cornelis Networks)
  • Thomas Naughton (ORNL)
  • Todd Kordenbrock (Sandia)
  • Tommy Janjusic (nVidia)

not there today (I keep this for easy cut-n-paste for future notes)

  • Akshay Venkatesh (NVIDIA)
  • Artem Polyakov (nVidia)
  • Aurelien Bouteiller (UTK)
  • Brandon Yates (Intel)
  • Charles Shereda (LLNL)
  • David Bernhold (ORNL)
  • Erik Zeiske
  • Geoffroy Vallee (ARM)
  • George Bosilca (UTK)
  • Hessam Mirsadeghi (UCX/nVidia)
  • Jan (Sandia -ULT support in Open MPI)
  • Jeff Squyres (Cisco)
  • Jingyin Tang
  • Joseph Schuchart
  • Josh Hursey (IBM)
  • Joshua Ladd (nVidia)
  • Marisa Roman (Cornelius)
  • Mark Allen (IBM)
  • Matias Cabral (Intel)
  • Matthew Dosanjh (Sandia)
  • Michael Heinz (Cornelis Networks)
  • Nathan Hjelm (Google)
  • Noah Evans (Sandia)
  • Raghu Raja (AWS)
  • Ralph Castain (Intel)
  • Sam Gutierrez (LLNL)10513
  • Scott Breyer (Sandia?)
  • Shintaro iwasaki
  • William Zhang (AWS)
  • Xin Zhao (nVidia)

Reminders

  • Thursday HAN/Adapt wrapup decision.
    • Contact Geoff Paulsen if you need webex info

v4.1.x

  • v4.1.5
    • Schedule: targeting ~6 mon (Nov?)
    • No driver on schedule yet.
  • Potential CVE from 4 years ago issue in libevent.. but might not need to do anything.
    • Updated one company reported scanner didn't report anything.
    • Waiting on confirmation that patches to remove dead was enough.

v5.0.x

  • Finally swapped out the PPRTE submodule pointer to point to v3.0 branch
  • Did it without the SLURM fix, but there was some traction there.
  • Posted Issue Open-MPI #10698 with about 13 issue, that will need
  • NEED an mpirun manpage
  • NEED mpirun --help
  • Need all these fixes before PRTE ships v3.0.0
  • Any of these issues complex?
  • Testing mpirun command line options.
  • Supposed to do automatic translations from old command line options to new options.
    • Are we planning to get rid of options at some point?
    • Not printing deprecated warning by default.
    • We've made new options (that are the new way), but if we're not encouraging people to go to them, why?
      • Can we even map old to new options one-to-one.
    • We "own" the szitso component and we could ditch new options, and only use old options if we want.
    • Before we force any change, we should get user's
    • Old ones had auto-completion.
    • If we have old options that are going to new options, weird that we don't print the messages.
    • v5.0 was supposed to be pretty disruptive, but if we go back and make it less disruptive, that's fine, but we are kinda saying that the old options are the way.
  • Do we want HW_GUIDED in v5?
    • No discussion.
  • It's be nice to make a test suite that assumes 2-4 Nodes with 4ppr or so... *
  • Schedule:
  • Docs
    • mpirun --help is OUT OF DATE.
      • Have to do this relatively quickly, before PRRTE releases.
      • Austen, Geoff and Tomi will be
      • REASON for this, is because mpirun command line is in PRRTE.
  • mpirun manpage needs to be re-written.
    • Docs are online and can be updates asyncronously.
    • Jeff posted PR to document runpath vs rpath
      • Our configure checks some linker flags, but there might be default in linker or in system that really governs what happens.
  • Symbol Pollution - Need an issue posted.
    • IS this a blocker? but need to clean up as much as possible.
    • Open-MPI community's perspective, our ABI is just MPI_Symbols
    • Still unfortunate. We need to clean up as much as possible.

Main branch

  • HAN / Adapt runs.
    • Part 2 of discussion this Thursday.
    • Discuss making it default, with different tuned, tunings.
  • Incompatibilities in User Level threading that Jan
    • What's the schedule for fixes to get into v5.0.x
    • Will try to get PRs in by end of August and then iterate.

Accelerator framework

  • No discussion. Still some changes needed before we can retest/rereview.

Attomics PRs.

  • Switching to builtin atomics,
    • 10613 - Prefered PR. GCC / Clang should have that.
    • Next step would be to refactor the atomics for post v5.0.
    • Waiting on Brian's review and CI fixes.
  • Joseph will post some additional info thing in the ticket

MTT

Administrative tasks

Face-to-face

Clone this wiki locally