Skip to content

Notes from EasyBuild maintainer summit 2025

Kenneth Hoste edited this page Feb 13, 2025 · 1 revision

EasyBuild Maintainers Summit 2025

Mon 10 Feb + Thu 13 Feb 2025 (14:00-16:30 CET), virtual (via Zoom)


Topics

Extra topics are welcome!

  • Governance
  • Technical Steering Committee for EasyBuild?
  • EasyBuild vs EESSI: positive/negative aspects
  • Should EasyBuild join HPSF?
  • Maintainer-of-the-week: should it come back?
  • EasyBuild 5.0: lessons learned
  • Goals for improved testing setup
  • Clang/BLIS in foss toolchain
  • Retiring non-active maintainers

Mon 10 Feb 2025, 14:00 CET

attendees: Kenneth, Bart, Sam, Mikael, Adam, Bob, Sebastian, Jasper, Simon, Lara, Alan, Caspar, Alex, Ake

topics for today: governance-related

HPSF

  • https://fosdem.org/2025/schedule/event/fosdem-2025-6656-the-high-performance-software-foundation-hpsf-
  • benefit?
    • a bit unclear in practical terms
    • see also https://hpsf.io/join
    • sign of maturity
    • access to HPSF funds, members
    • support from Linux Foundation w.r.t. legal aspects, etc.
    • learn from other projects w.r.t. governance, technical aspects, ...
    • would push us to take formal steps w.r.t. governance
  • should EasyBuild join?
    • risk of HPSF being perceived as US-focused
      • could impact us w.r.t. EuroHPC funding
    • can EB LF be used as legal entity to apply for funding?
  • ask Xavier for input on why Environment Modules is being onboarded
  • European HPSF members: CEA, JSC
  • mostly in favor: Alan, Kenneth
  • less positive: ?
  • onboarding process can be a bit time-consuming (cfr. Kokkos)
  • next steps
    • explore in UGent whether they want to become HPSF Associate Member + onboard EasyBuild as HPSF project (via UGent Tech Transfer?)

Technical Steering Committee for EasyBuild

  • members for initial Steering Committee: Kenneth, Simon, Caspar, Sam, Adam, Sebastian
    • shouldn't overlap significantly with EESSI Steering Committee
  • initially for 1 year, meeting once every quarter
  • short-term tasks:
    • write out goals (diversity, geography, steering vs advisory board) for governance
    • how to form actual steering committee: timeframe, candidates, voting, spread of interests (sites, EESSI, ...), etc.
    • should EasyBuild join HPSF? => write up advice for actual Steering Committee who can decide in 2026
    • guard impact of EESSI on EasyBuild, criteria w.r.t. contributions in that context?
  • Caspar could give short talk on this at EUM'25

EasyBuild vs EESSI

  • EasyBuild maintainers active in EESSI: Kenneth, Caspar, Lara, Alan, Bob, (Sam)
  • additional EESSI-related PRs
    • get mostly handled through people involved with EESSI, so OK
    • how many PRs are related to EESSI?
  • negative impact of EESSI on EasyBuild?
    • effort spent on EESSI takes time away from EasyBuild maintenance work
    • has delayed work on EasyBuild 5.0 to some extent
  • positive impact
    • brings additional use case for EasyBuild, increases relevance of EasyBuild
    • effort on EESSI has "trickled down" to EasyBuild (cfr. efforts by Caspar, Lara, Bob)
    • brings more attention to EasyBuild
      • easyconfig is first step to get software into EESSI
    • attention of developers, cfr. recent GROMACS PRs
    • broader testing across other architectures (different Intel/AMD generations, Arm, RISC-V)
    • funding for EUM'25 group dinners + social event
  • risks
    • development & maintenance effort becoming too much focused on EESSI
  • funding going into EESSI, while EasyBuild was never directly funded
    • lots of "in-kind" contributions to EasyBuild
  • legal entity via HPSF/LF could be a way of letting EasyBuild benefit from funding for EESSI as an additional "partner"

Maintainer-related topics

  • retiring non-active maintainers
    • active: Kenneth, Lara, Sebastian, Miguel, Simon, Alex, Bob, Jasper, Adam, Sam, Alan, Mikael, Bart, Åke, Caspar, (Balázs)
    • no longer active: Damian, Pablo, Fotis, Ward, Davide, Lars
      • Damian is still working with EasyBuild and EasyBuild expert @ JSC, Sebastian took over as lead of software team
      • will be contacted by Kenneth to retire them as EasyBuild maintainer
      • EasyBuild maintainer rights should be removed + membership to maintainers mailing list + Slack channel, docs updated accordingly (cfr. https://docs.easybuild.io/maintainers)
  • MotW
    • maintainers often focused on stuff directly useful for them (which is fine)
    • this died since Sept'24, Kenneth couldn't find time to set up the MotW schedule...
    • getting EasyBuild v5.0 out the door + setting up better testing should get priority
    • some maintainers prefer only installing stuff from merged PRs
      • combined impact of ditching MotW, effort on EasyBuild 5.0 + EESSI leads to easyconfig PRs being longer open
      • sort of self-imposed policy (Adam)
    • MotW did also help significantly with handling framework + easyblock PRs
    • (Alex) did MotW really work that well before EasyBuild 5.0 sucked up a lot of time?
      • lots of overhead to plan it, unclear whether people were actually actively acting as MotW during their week
      • do we really need someone "at the door" every week?
      • (Caspar) does help for some maintainers
    • (Alex) regular "stand-up" meetings could be an alternative
      • set priorities + assign issues/PRs to someone
        • work well for framework/easyblocks in EasyBuild 5.0 sync meetings
      • also try to find a volunteer to cater easyconfig PRs?
      • on Mondays, flip between 10am CET & 3pm CET to cater to multiple time zones
    • (Alan) bring back merge sprints for easyconfigs?
  • potential candidates for new maintainers
    • cfr. https://docs.easybuild.io/maintainers/#maintainers_criteria
    • Alexander Grund (TU Dresden, @Flamefire): probably better as expert contributor, has been asked before as maintainer and then declined
    • Jan Reuter (JSC, @Thyre): recent frequent contributor, compiler expert, ...
    • Davide Grassano (CECAM, @Crivella): recent expert contributor to EasyBuild & EESSI
    • Maxim Masterov (SURF, @maxim-masterov)
    • Cintia Willemyns (VUB, @WilleBell)
    • more suggestions?
      • KH will look at stats for recent contributors

EasyBuild 5.0

  • delay in getting EasyBuild 5.0 out the door complicates testing

Testing of easyconfig PRs

  • generoso became unavailable
  • requests to bot to test are taking too long to pick up, which is annoying
  • multiple test configurations
    • multiple OS configs: Rocky 8/9, other OS
    • multiple CPU families: AMD, Intel, Arm
    • EESSI vs bare metal
    • some configurations should be non-blocking
      • in case of failing tests, open issue to keep track of the problem + merge
      • also depends on what fails
      • keep track of known problems in easyconfig + report when EasyBuild tries to install it
        • based on OS, CPU arch.
  • revive bot that comments when CI fails
    • got broken because of breaking change in GitHub API, Kenneth hasn't been able to find a workaround
  • fully automatic build testing of PRs if CI goes green after a PR is merged
    • only for trusted contributors?
    • only after approved review, to avoid mailicous activity + high-impact mistakes
    • risk of prioritizing PRs by frequent/trusted contributors
    • sandboxed environment to avoid impact: no write access to existing installations, staged installation (only deploy when PR is merged)
    • rule-based checks to check whether a PR can be considered for build testing?

Thu 13 Feb 2025, 14:00 CET

Maintainers

EasyBuild 5.0

  • lesson learned, what could we have done better?
  • there will be an EasyBuild 6.0 at some point
    • with consistent naming for things (effort that was started for EasyBuild 5.0, but was postponed since it would make more delays)
  • we're not releasing major version often enough
    • "chance of a lifetime" kind of feeling to get breaking changes in
    • guaranteeing that deprecated functionality actually keeps working is quite difficult
    • 2.0 -> 3.0 was ~1.5y
    • 3.0 -> 4.0 was ~3y
    • 4.0 was 5+ years ago...
    • desired timeline for 6.0?
      • ~2 years seems OK => change freeze on 1 Nov 2026, first release of 2027
      • consistent naming of things is a clear goal for 6.0
  • definitely some scope creep happened while working on EasyBuild 5.0
    • run_shell_cmd (was planned in April 2023)
    • rework of module generation: turned out to be more work than anticipated
    • effort to make naming of things more consistent was started but then postponed
  • organising the work better would have helped
    • has sort of happened organically in last couple of months
  • more helping hands might have helped (but maybe not -> "mythical man month")
  • be less afraid of making breaking changes?
    • deprecating stuff implies more work
    • breaking stuff would annoy a lot more people
  • 5.0.x branch was a necessary evil, but has had a negative impact
    • extra effort to sync develop back into 5.0.x
    • has also resulted in less time being spent on easyconfig PRs
    • collapsing 5.0.x into develop should have happened way earlier
  • a lot of time was spent on EESSI too...
  • a lot of work has been done since we started working on EasyBuild 5.0...
    • ~180 PRs in framework repo
    • ~240 PRs in easyblocks repo
    • ~120 PRs in easyconfigs repo
    • => >1 PR per day for > 2 years

Testing setup

  • blocked by EasyBuild 5.0
  • event-based bot(s), like in EESSI
    • or look into using buildbot?
  • use containers to control build environment
  • test matrix (OS/CPU arch/configuration)
    • automated test report for these configurations
    • tier-1: blocks PRs if testing fails
      • (at jsc-zen3, zen3 & zen3+a100) most common env: AMD CPUs, Rocky 9, default EasyBuild config (RPATH), Lmod, somewhat minimal container (patch, make, gzip, bzip2, ... no devel pkgs)
      • (at AWS) Intel CPU, very minimal container (removing perl binary, etc.), non-RPATH EasyBuild configuration
    • tier-2: not necessarily blocking PR if test build fails, but open issue
      • (at AWS -> Graviton3) EESSI-based: Arm CPU, RPATH, sysroot, filtered deps, alternate module naming scheme (HMNS)
      • (at JSC, zen2) fat Ubuntu container (Doxygen, Boost, ...), Environment Modules (Tcl)
    • tier-3: no bot, manual triggered tests builds by contributors/maintainers
      • PRs for commercial software
      • AMD GPUs
  • trigger additional tests based on labels
    • like riscv
  • also do --module-only (run + diff) + --sanity-check-only
    • should block PRs
    • known to be broken for some easyblocks (like WRF/WPS)
    • would be nice to do this in the wake of pre-release EasyBuild 5.0 regression test
  • enforce in CI that modtclfooter is used if modluafooter is used
    • with whitelist as escape hatch for complicated cases
    • (ask for help from Xavier?)
  • inspiration from LLVM talk: https://fosdem.org/2025/schedule/event/fosdem-2025-6644-programming-is-fun-testing-is-needed-infra-is-/
  • jsc-zen3 Magic Castle Cluster
    • 3 partitions:
      • zen3
      • zen3+a100
      • zen2+v100
    • could be expanded with Zen4 very soon, maybe also Intel Cascade Lake
  • (not discussed in the call) only deploy installations when PR is merged (build in staging directory via bwrap)

Clang/BLIS in foss toolchain

  • Clang is starting to "take over" from GCC?
  • TensorFlow officially doesn't support GCC anymore, must be Clang
  • should we add Clang to the foss toolchain, and support opting in to using it for builds (via toolchainopts)?
  • would help to make it possible to perhaps even switch from GCC to Clang as default compilers in foss
  • @Crivella has been experimenting with building everything with Clang
    • there's definitely some pain still in there
  • do we keep GCCcore still around?
    • we probably should, but then we need to be more strict
      • already being done for Fortran modules in EasyBuild 5.0 => trouble when GCCcore toolchain is used
      • banned library mechanism we have to catch direct linking to OpenBLAS can help us to avoid installing stuff that requires OpenMP runtime with wrong toolchain
    • can't mix OpenMP runtime of GCC and Clang (like for Fortran modules)
      • libgomp in GCC, libomp in Clang, libiomp in Intel
      • only really a problem for libraries linking to OpenMP runtime
        • incl. "compiled" Python packages (.so)
  • first make a fully-fledged Clang-based toolchain as alternative to GCC-based foss
    • to avoid making things to complicated
  • building everything with Clang is sort of swimming up stream, may be trouble for everything
  • ...
  • also: consider switching to BLIS as default BLAS/LAPACK library backend for FlexiBLAS?
    • OpenBLAS is a one-man show, not sure we'll be able to keep relying on it
    • no LAPACK library provided by BLIS? (cfr. libFLAME)
      • netlib LAPACK on top of BLIS is OK?
      • libFLAME isn't a full replacement for LAPACK, and we've observed more failing tests
    • for single-core stuff, doesn't matter much if you use OpenBLAS/BLIS/MKL
    • for multi-threaded stuff, BLIS has an edge, AOCL-BLAS even more
      • for Zen4, OpenBLAS still has an edge over BLIS due to AVX-512 kernels (Skylake), and AOCL-BLAS is even better
  • which MPI do we go with in Clang-based toolchain?
    • MPICH to be closer to vendor toolchains
    • would make jump to Clang-based toolchain from foss bigger
    • we don't treat libfabric like we do UCX (for CUDA support, multi-GPU)

Other

  • get more organised w.r.t. cleanup of dead code
    • dedicated label that allows us to easily find PRs where candidate dead code was touched
    • or dedicated issue to keep list of things to revise before next major release of EasyBuild (similar to this)
    • highlight this in best practices
  • support for using generating module file with multiple module naming schemes
  • auto-tagging of PRs
    • cfr. Flamefire's idea on tagging PRs with size PRs
      • most easyconfig PRs
      • maybe more useful for framework/easyblocks
      • single-easyblock PR vs multi-easyblock
      • generic vs custom easyblocks
    • auto-tagging per easyconfig generation
      • very useful for cleaning up PRs for old toolchains
    • CUDA
    • "core" packages (toolchain components, Rust/Python/Perl, CMake/Boost/X11, SciPy-bundle, etc.)
      • getting those PRs merged quickly would be helpful
    • PRs that only add extensions to bundles
    • stale PRs
      • no activity in N weeks
    • CI failing
      • as replacement for boegelbot not being able to anymore (or can it)
    • "waiting-for-reviewer", "waiting-for-contributor" style labels
      • "waiting-for-contributor" after review with requested changes
      • "waiting-for-reviewer" if extra commits have been pushed
      • "fresh-PR" or "review-required" if there hasn't been activity after opening of PR
      • "tested-by-bot" if bot has tested it
      • "tests-OK" if there are only successful test builds
      • "trivial-version-bump"

Next EasyBuild Maintainers Summit

  • maybe physical/hybrid, coincide with FOSDEM'26 in Brussels?
  • yearly, 2nd week of February, so:
    • Mon 9 Feb 2026 14:00-16:30 CET
    • Thu 12 Feb 2026 14:00-16:30 CET
Clone this wiki locally