Skip to content

WeeklyTelcon_20230411

Geoffrey Paulsen edited this page Apr 11, 2023 · 1 revision

Open MPI Weekly Telecon ---

  • Dialup Info: (Do not post to public mailing list or public wiki)

Attendees (on Web-ex)

  • A. Bouteilla (ATK)
  • Edgar Gabriel (AMD)
  • Geoffrey Paulsen (IBM)
  • Howard Pritchard (LANL)
  • Jeff Squyres (Cisco)
  • Joseph Schuchart (UTK)
  • Luke Robison (Amazon)
  • Matthew Dosanjh (Sandia)
  • Quincey Koziol
  • Thomas Huber
  • Thomas Naughton (ORNL)
  • Todd Kordenbrock (Sandia)
  • Tomislav Janjusic (nVidia)
  • William Zhang (AWS)

Not here today, but keep here for easy cut-n-paste for future.

  • Austen Lauria (IBM)
  • Brendan Cunningham (Cornelis Networks)
  • Brian Barrett (Amazon)
  • Christoph Nietham
  • David Bernholdt
  • Josh Fisher (Cornelis Networks)
  • Josh Hursey (IBM)

New Items

  • Tuned and MCA parameter Issue

    • Issue 11532 somehow related to Issue 11459
    • Summary: Something that was in v4.x (CLI and MCA params) is now broken.
    • We probably either need a code fix or a doc fix.
    • We need an answer by v5.0.0 as this is a break in ShellScripts compatibility.
  • New issue came out of this

    • Summary: Used to be two different formats of files:
      • Tuned files, and MCA param files
      • Did similar things, but two different formats.
      • PRTE eliminated one of the formats entirely
      • On OMPI side, have a minor incompatibility here.
        • Right now they get no warning that the file's not being read.
        • MINIMUM: Should emit an error (human is asking us to do something)
        • Might need to be fix in schizo
      • Discussion - do we want to put back 2nd flavor of MCA param file.
        • A little weird we have silent translation for everything except this.
    • Luke Volunteers to make the Issue
  • MPIR Shim (https://github.com/openpmix/mpir-to-pmix-guide) went away.

    • Howard pushed repo to somewhere.
    • Howard will hook it up to CI for some testing.
    • MPIR shim has some docs 12.7, it just need some new URL and info.

v5.0.x

  • release RC11 last week. Please test.

  • Issue reported that map-by not working.

  • OFI nic selection broken on v5.0.x

    • Fix itself updated the PMIx and PRTE pointers
    • Testing this patch right now with RC11 - Thomas
  • python needs fix in PMIx and then bringing this into

    • Come to some conslusion as to what is needed and can be backported?
  • PMIx in main has startup hang. 3%-20% failures.

    • Luke see the hangs in OMPI main, but not as much in OMPI v5
    • OMPI main when point to PMIx main
    • Tommy - one of the reasons why we're reluctant to push PMIx pointers.
    • Whatever fixes that go into PMIx
  • We'd talked about supplying some docs about how HAN is great, and why we're enabling it for v5.0.0 by default.

    • Like to include instructions on how to reproduce as well for users.
    • Section 5 Open MPI specific Features - good to highlight
      • Geoff emailed George, asking if they can.
      • George just linked in #general slack channel. 6x increase in speed.
      • George is asking for volunteers.
    • Ask about sitable paper(s)

Main branch

  • PR 11579 - Howard's adding some debug
    • HCOLL is using OMPI devel headers when compiling in debug
    • But part of installing devel headers, put communicator's headers.
    • People are complaining that if they have a sessions only test, it'd segv down in HCOLL because it's deciding that MPI_COMM_WORLD is valid. HCOLL is MPI3 compat, but not MPI4.
    • Bug in libhcoll (external to OMPI)
    • This is a bug in libhcoll, Known issue in HCOLL version (X)
    • Howard is closing this PR, and adding this as a known issue.

Administration Topics

  • No travel planned.
Clone this wiki locally