Skip to content

WeeklyTelcon_20221101

Geoffrey Paulsen edited this page Nov 1, 2022 · 1 revision

Open MPI Weekly Telecon ---

  • Dialup Info: (Do not post to public mailing list or public wiki)

Attendees (on Web-ex)

  • Austen Lauria (IBM)
  • Brendan Cunningham (Cornelis Networks)
  • Brian Barrett (AWS)
  • David Bernhold (ORNL)
  • Edgar Gabriel (UoH)
  • Geoffrey Paulsen (IBM)
  • Harumi Kuno (HPE)
  • Howard Pritchard (LANL)
  • Joseph Schuchart
  • Josh Fisher (Cornelis Networks)
  • Josh Hursey (IBM)
  • Thomas Naughton (ORNL)
  • Todd Kordenbrock (Sandia)
  • William Zhang (AWS)

not there today (I keep this for easy cut-n-paste for future notes)

  • Akshay Venkatesh (NVIDIA)
  • Artem Polyakov (nVidia)
  • Aurelien Bouteiller (UTK)
  • Brandon Yates (Intel)
  • Charles Shereda (LLNL)
  • Christoph Niethammer (HLRS)
  • Erik Zeiske
  • George Bosilca (UTK)
  • Hessam Mirsadeghi (UCX/nVidia)
  • Jan (Sandia)
  • Jeff Squyres (Cisco)
  • Jingyin Tang
  • Marisa Roman (Cornelius)
  • Mark Allen (IBM)
  • Matias Cabral (Intel)
  • Matthew Dosanjh (Sandia)
  • Michael Heinz (Cornelis Networks)
  • Nathan Hjelm (Google)
  • Noah Evans (Sandia)
  • Raghu Raja (AWS)
  • Ralph Castain (Intel)
  • Sam Gutierrez (LLNL)10513
  • Scott Breyer (Sandia?)
  • Shintaro iwasaki
  • Tommy Janjusic (nVidia)
  • Xin Zhao (nVidia)

Ralph joined to ask some binding/mapping opinions.

Default placement, if np <=2 then map by core, else map by NUMA (if defined) else map by Package. But issue is that a customer has a Package inside of NUMA * OMPI recently has a user that DID hit this. They were mapping by NUMA inside the package, and not what was expecting. * Specify map-by-package solving * Hard to debug by looking at lstopo. If someone gets something weird when trying to map-by numa, try map by package. What should we do for default mapping policy (or ANY mapping policy), but don't say what the ranking policy, what should the ranking policy be?

  • Historically, ranking mirrored the mapping policy.
    • but it's pointed out this isn't the optimal placement (since most apps communicate with neighbors).
    • So then it was proposed to map by SLOT.
  • But then the user looks, and gets confused because that's not what they thought they were getting.
  • Please think about this, and decide and lock this down.
    • Brian thinks the default has to be rank by SLOT. (NUMA or Package, less strong thoughts), but in absence of any information.
    • Initial thought was that if user specifies non-default mapping, they then NEED to specify a ranking and vice versa.
      • Can print a useful error message.
      • We can't make everyone happy in this case, so this might be best option.
      • if users don't want to specify this every time, they can set an env var, or make an entry in conf file.

v4.1.x

  • v4.1.5
    • Posted an RC1 last week. Brian forgot to send email to devel.
    • Schedule is still end-of-month.
    • May be the last v4.1.5 unless lots of bugs.
    • Patch that needs some work, didn't compile. We'd take if it passes.

v5.0.x

  • RC went out a couple of weeks ago.

  • We'll need at least one more RC before we release.

  • HAN/Adapt is remaining blocker.

    • Finally figured out why timings were so variable.
      • Because we select Bruck for Barrier for no reason...
      • since OSC times barrier as well, that was the cause for the variations he was seeing.
    • There's a patch that proposes to only use HAN if the rank-distribution if we
    • Don't think we should block v5.0 longer
    • Don't think we'll figure out how to make HAN faster than tuned if
  • Don't have a good reason yet why HAN's Barrier is slower.

  • We promised better collective performance for v5, but we have not delivered.

    • What do we do?
      • Two choices:
        • Ship now and say that we're sorry our collective performance
          • We'd need some messaging about how we're handling this.
        • How do we talk to the community about this.
      • Are there any cases where this work actually improves thing?
        • Something a bit positive where this work
        • Goes back to where ranks aren't ordered by SLOT.
          • Don't understand why only those are better.
    • Do we make it better in the common case? - No.
  • Super Computing 2021

    • ULFM, Threading MCA framework, MTL OFI, UCC
    • Pretty sure we DID messaging around this.
  • Have had a number of new PRs.

    • Did make changes to Tuned and had a PR where priorities were adjusted.
    • Seeing better performance for OMPI than Intel MPI.
    • Whatever the "out-of-box" performance is what they are getting. *
    • If you only have a few ranks per node, then HAN doesn't help that much.
  • Preparing for release.

    • Nov 14th release date.
    • Remaining known blocking issues:
      • OSHMEM blocker issue #10978
      • OPAL LIFO tests fail on 390x - suspects bad gcc. says it works with v4.1, but fails with v5.0
        • Doesn't seem to have support for 128bit architectures. Can't use C11
      • Jenkins Pipeline fix (No issue)
  • Jenkins - make tarball issue.

    • RPM builds dont work in Jenkins on v5.0.x
      • Doesn't block RC, but DOES block release.
  • HAN/Adapt - #10963

    • Still some concerns that need to be addressed.
  • Docs - Remaining blocking issue (besides above) for v5.0.0

Main branch

Accelerator framework

  • Merged to main, and to v5.0.x
    • Try it in v5.0.0rc9

MTT

Administrative tasks

Face-to-face

Super Computing?

  • Open MPI missed submitting request for BoF this year.
  • MPI Forum will be presenting.
Clone this wiki locally