Skip to content

WeeklyTelcon_20211026

Geoffrey Paulsen edited this page Nov 2, 2021 · 1 revision

Open MPI Weekly Telecon ---

Attendees (on Web-ex)

  • Austen Lauria (IBM)
  • Brendan Cunningham (Cornelis Networks)
  • Brian Barrett (AWS) - Welcome Back!
  • Geoffrey Paulsen (IBM)
  • Hessam Mirsadeghi (NVIDIA))
  • Jeff Squyres (Cisco)
  • Matthew Dosanjh (Sandia)
  • Michael Heinz (Cornelis Networks)
  • Sam Gutierrez (LANL)
  • Sriraj Paul (Intel)
  • Todd Kordenbrock (Sandia)
  • Tomislav Janjusic (NVIDIA)
  • William Zhang (AWS)

not there today (I keep this for easy cut-n-paste for future notes)

  • Akshay Venkatesh (NVIDIA)
  • Artem Polyakov (NVIDIA)
  • Aurelien Bouteiller (UTK)
  • Brandon Yates (Intel)
  • Charles Shereda (LLNL)
  • Christoph Niethammer (HLRS)
  • David Bernholdt (ORNL)
  • Edgar Gabriel (UH)
  • Erik Zeiske (HPE)
  • Geoffroy Vallee (ARM)
  • George Bosilca (UTK)
  • Harumi Kuno (HPE)
  • Howard Pritchard (LANL)
  • Joseph Schuchart (HLRS)
  • Josh Hursey (IBM)
  • Joshua Ladd (NVIDIA)
  • Marisa Roman (Cornelius)
  • Mark Allen (IBM)
  • Matias Cabral (Intel)
  • Nathan Hjelm (Google)
  • Noah Evans (Sandia)
  • Raghu Raja
  • Ralph Castain (Intel)
  • Scott Breyer (Sandia?)
  • Shintaro iwasaki
  • Thomas Naughton (ORNL)
  • Xin Zhao (NVIDIA)

New Topics For Today

v4.0.x

  • Schedule: Pushed to October for 4.0.7
  • --cpu-set - Geoff working on PR for nice warning/docs
  • Fortran PR 9259, 9367 probably affect v4.0.x branch as well.
    • Geoff will follow up.
  • Michael Heinz installed it on clusters and looking good so far.
    • Omnipath fabrics, and looking good so far.
      • Issue (by design): BTL and MTL both open own endpoints, so.
    • Brian approved, could go into next v4.0.x RC - just need in v4.0.x
  • Geoff and Howard

v4.1.x

v5.0.x

  • Schedule: rc2 went out yesterday.
  • 4 PRs open.
    • PR 9594 - Fixes some BTL issues (against master) will take a few days to review.
  • Issue #9554 Jeff asked about Partitions support going to v5.0 or not?
    • Matthew is interested
  • PR #9495 TCP Onesided for master.
  • Tommy's still pushing on UCX Onesided.
  • PR 9576 - Ralph filed a ticket about building packages externally.
    • Working with fedora packagers. Will be a v5.0.x
    • Might need some back and forth with PMIx. The way he updated PMIx might need massive change to OMPI.
  • MPI Info stuff that Yoseph and Howard are working on.
    • Marking a few MPI_ calls as deprecated.
    • Nevermind, Don't mark as deprecated, since we're not MPI 4.0 compliant, so DONT mark as deprecated yet.
  • Documentation
    • Got a change in sphynx tools needed. No sure if there's a release yet.
      • This fixes outputting issues in manpages.
    • Process to update FAQ is to talk to Jeff or Harumi.
    • Any changes in README or FAQ let them know to make changes in NEW docs.
      • For now, make changes in ompi-www and README as usual and let them know.
  • Issue 9501 regression, needs to be fixed or reverted.
  • v5.0.x requires pandoc. If user downloads from .tarball they do NOT need pandoc installed.
    • If user runs make dist or make dist-check they WILL need pandoc.
      • This is a strange quirk, but seems fine.
  • Github Project of [critical v5.0.x issues|https://github.com/open-mpi/ompi/projects/3]
    • Issue #8983 If we partially disable OSC/TCP BTL - Not breaking MPI compliance, just breaking One-sided performance badly.
    • Described approach of rc1 on Sept 23, disabling any functionality that are blockers to allow for the rc.
      • Worried that blockers might not be fixed in time, so will put in code to issue an error at runtime to prevent getting into those paths, and document it heavily.

Super Computing SC BoF

  • Time and Date of BOF Nov 16 @ 12:15pm US Eastern Time.
  • Was accepted for Open MPI
    • Our Hybrid BoF will be mostly VIRTUAL BoF
      • George may be there in person for tutorial (tho other tutorials will be fully-virtual)
    • Bird of a Feather will be Virtual.
    • George sent out an email to Amazon, Cisco, IBM, nVidia
  • Where do we drop slides? Jeff will send again. Deadline T-minus 1-week.
    • Google Slides.

Master

Documentation

  • No update
  • Don't do the old system, use this new system for v5.0.0

MPI 4.0 API

  • No discussion [Open MPI 4.0 API Compliance Github Project|https://github.com/open-mpi/ompi/projects/2]
  • Joseph says we're not dropping Info Keys as we SHOULD in the MPI 4.0.
    • Can make it work easily for Comms because it would need to go down into the PMLs.
    • Issue #9555
    • Do we want this in OMPI v5.0.0?
      • It'd be nice, because it's going to change behavior.
      • But it might also be bad because it's a change in behavior (if users depending on MPI 3.1 behavior)
        • But since it wasn't specified in MPI 3.1, so maybe whatever we do is okay.
  • Jeff's going to review PR 9246
  • Howard will review 7985
  • Need to decide what to do with 8057
  • Sessions branch, don't want to merge into master until possibly v5.0.1 gets out.
    • It will complicate things in finalize/initialize code.

MTT

  • Looking okay.
  • Looks like something was wrong with MTT.
    • That machine just got upgraded.
    • Install fail is kinda weird.

Longer Term discussions

  • No discussion.
Clone this wiki locally