Skip to content

WeeklyTelcon_20230321

Geoffrey Paulsen edited this page Apr 4, 2023 · 1 revision

Open MPI Weekly Telecon ---

  • Dialup Info: (Do not post to public mailing list or public wiki)

Attendees (on Web-ex)

  • Didn't capture today.

Not here today, but keep here for easy cut-n-paste for future.

  • Brendan Cunningham (Cornelis Networks)
  • Brian Barrett (Amazon)
  • Edgar Gabriel (AMD)
  • Geoffrey Paulsen (IBM)
  • Howard Pritchard (LANL)
  • Jeff Squyres (Cisco)
  • Joseph Schuchart (UTK)
  • Josh Fisher (Cornelis Networks)
  • Luke Robison (Amazon)
  • Quincey
  • Todd Kordenbrock (Sandia)
  • Tomislav Janjusic (nVidia)
  • William Zhang (AWS)
  • Austen Lauria (IBM)
  • Christoph Nietham
  • David Bernholdt
  • Josh Hursey (IBM)
  • Matthew Dosanjh (Sandia)
  • Thomas Naughton (ORNL)

New Items

  • OAC submodule

  • FAQ changes - Anything else that needs progress.

    • Not a blocker for v5.0.0
  • MCA attribute registration cdoe

    • Color string issue - When you call mca_register_param - it creates a string, and overwrites the pointer you gave it.
      • when you go through cleanup (MPIT_Finalize), this pointer is set to null.
      • Any time we have multiple MPIT_INIT/MPIT_Finalize together, we can get some behavior that we're not accounting for behavior. But some places in code account for this, so not unknown.
        • Do we want to fix every single invocation. Or change way the function works.
      • Is it a segv or lose param? - lose it, but then causes an issue.
      • Does valgrind help find these places? How easy is it to find other places?
        • greped for string type, and found a few places that do it correctly, and some do it wrong.
    • MPIT_pvar Issue 11492
      • on ompi 4.1 - so won't be a regression. Program didn't crash, just pvars are strange.
  • MPI Sessions Issue

    • Issue - not an issue yet, will file one Some info in #general (3/20)
    • Simple example, querying a session from pset, create a group, MPI_Comm_create_from_group, since UCX didn't support it, and tried obi1.
      • UCX will have to wait, need a new thing in PMIx.
    • Don't consider anything with sessions as a show stopper.
    • May not get to until next week.
  • MPIR Shim (https://github.com/openpmix/mpir-to-pmix-guide) went away.

    • Howard pushed repo to somewhere.
    • Howard will hook it up to CI for some testing.
    • MPIR shim has some docs 12.7, it just need some new URL and info.

v5.0.x

  • 11510 - New Blocker because of Wrong answers - Will investigate soon.

  • Amazon is investigating a significant numbers of hangs on v5.0.0rc

  • What to do about 11415 - Discussed and said this is correct behavior.

    • Customer accepted new behavior for OMPI v5.0
  • Runtime docs stuff should be doable by end of the month.

  • We'd talked about supplying some docs about how HAN is great, and why we're enabling it for v5.0.0 by default.

    • Like to include instructions on how to reproduce as well for users.
  • New MTT regression failure introduced

    • A legit issue in main
    • fixed by PR #11499
      • PR CI for Mellanox passes, but IBM CI failed (Possibly due to lab moves)
      • Give it another day or so to see if it's just
  • CI topic

    • Testing a new way (not ancient Github Builder Plugin for Jenkins) using a Jenkins file Pipelines
    • Want to switch over ASAPly been more or less ready for the last month, but now we need to change.

Main branch

Administration Topics

Clone this wiki locally