Skip to content

WeeklyTelcon_20171024

Geoffrey Paulsen edited this page Jan 9, 2018 · 1 revision

Open MPI Weekly Telcon


  • Dialup Info: (Do not post to public mailing list or public wiki)

Attendees

  • Geoff Paulsen (IBM)
  • Jeff Squyres
  • Brian
  • Geoffroy Vallee
  • George
  • Howard
  • Josh Hursey
  • Mathew / SNL
  • Mohan
  • Nathan Hjelm
  • Ralph
  • Todd Kordenbrock
  • Edgar Gabriel

Agenda

Review v2.0.x Milestones v2.0.4

  • NEWS - Labor intensive to make NEWs every time. Can't we automate this?
  • Can we just use the short titles from the PR titles?
  • Not this week.
  • Don't include high sierra fix.
  • Schedule: Get it out this week.

Review v2.x Milestones v2.1.2

  • v2.1.3 (unscheduled, but probably jan 19, 2018)
    • PR4172 - a mix between feature / bugfix.
  • Are we going to do anything for v2.x for hwloc 2?
    • At least put in a configure error if detects hwloc v2.x

Review v3.0.x Milestones v3.0

  • v3.0.1
    • Still targeting End of October for release of v3.0.1
    • a few PRs need review.
    • Schedule: Still shooting for End of October.
  • v3.1.x -

    • Roll hwloc back to 1.11.7 on v3.1.x branch (Ralph put together, Brian reviews)
    • Will support an external hwloc v2.0.x, but default will be hwloc 1.11.7.
    • PMIx - v3.1.0 was supposed to go out with PMIx 2.1.0 with cross version support
      • Cross version support of PMIx is working fine, as long as not using PMIx shared memory.
        • Fixing shared memory piece in v2.1 (with cross version support) needs a complete re-write.
      • Ticket out there, needs review,
      • Do we want to ship with PMIx v2.0 an no cross-memory support? Or PMIx v2.1, but don't support shared memory? (would have a number of build time flags to throw to get this to build).
    • Could delay...
    • Could we ship BOTH, and have the default be the PMIx v2.1 without shared memory
      • provide a configure time flag to build with PMIx v2.0 to allow shared memory for high core-count platforms.
      • BUT, the backwards compatibile PMIx v2.1 still doesn't work with older PMIx versions if they were built with dstore (which is/was the default), so they have to go back and rebuild their PMIx stuff.
    • All of our options are BAD, so lets delay a week and discuss next week as to what we can do.
    • Send out an email to devel-core, and say we're going to delay v3.1 to fix it.
      • Amazon will scope the amount of changes for dstore this week.
  • Schedule - Unsure, will see about above, and discuss next week.

  • Add v3.1 to MTT tests

    • Database is active now to accept v3.1 tests.
  • MTT disks were getting full - PHP was trying to use /tmp, and local /tmp was full all weekend, so submissions weren't working. Josh moved what he could, but still thinks PHP is putting something in /tmp.

  • Administration

    • Restored the Partner desgination.
    • Voted in Mexico Consortium

Review Master Master Pull Requests

  • Master version is currently v4.0, but that's an artifact of the datatype stuff that ended up getting pulled into v3.0, so it can be made to be v3.2


MTT / Jenkins Testing Dev

  • Looking reasonablly good, but history is all mucked up.
  • Something is going on with Jenkins (it looks like it's totally turned off right now)
  • Treematch segfault issue - just master? We think.
    • IBM has a patch we'll get PRed upstream, not sure if it fixes the same root issue others were seeing, but it fixes it in IBM's environment.
  • George accidentally pushed a branch 'v3.x' into upstream.
    • Just delete it.
  • Jenkins - Botny Bay, and Berkly machines - both had issues where Jenkins couldn't ssh into those machines, and logged that it couldn't.
    • This filled the disk, and ran us out of Web server credits.
    • Brian will send out config to Nathan on how to setup a daemon for connections so Jenkins won't sit in loop trying to ssh nodes it can't get to. He already has MAC-OSX config.
    • There is a wiki page with instructions also.
    • Brian will also put Jenkins on it's own partition to help isolate us.
    • When Jenkins goes bonkers it consumes all CPU cycles on the machine.
  • Discussed Issue 4349
    • We seem to remember disabling it due to a real bug.
    • IBM will dig through notes and reply on Issue.

This week Discussion Points.

  • Website - openmpi.org
    • Brian trying to make things more automated, so can checkout repo, etc. Repo is TOO large.
    • Majority of the problem is the Tarballs. and already storing those in S3.

Oldest PR

Oldest Issue

Next face-to-face meeting

  • Jan / Feb
  • Possible locations: San Jose, Portland, Albuquerque, Dallas

Status Updates:

Status Update Rotation

  1. Mellanox, Sandia, Intel
  2. LANL, Houston, IBM, Fujitsu
  3. Amazon,
  4. Cisco, ORNL, UTK, NVIDIA

Back to 2017 WeeklyTelcon-2017

Clone this wiki locally