-
Notifications
You must be signed in to change notification settings - Fork 144
Notes from EasyBuild maintainer summit 2025
Kenneth Hoste edited this page Feb 13, 2025
·
1 revision
Mon 10 Feb + Thu 13 Feb 2025 (14:00-16:30 CET), virtual (via Zoom)
Extra topics are welcome!
- Governance
- Technical Steering Committee for EasyBuild?
- EasyBuild vs EESSI: positive/negative aspects
- Should EasyBuild join HPSF?
- Maintainer-of-the-week: should it come back?
- EasyBuild 5.0: lessons learned
- Goals for improved testing setup
- Clang/BLIS in
foss
toolchain - Retiring non-active maintainers
attendees: Kenneth, Bart, Sam, Mikael, Adam, Bob, Sebastian, Jasper, Simon, Lara, Alan, Caspar, Alex, Ake
topics for today: governance-related
- https://fosdem.org/2025/schedule/event/fosdem-2025-6656-the-high-performance-software-foundation-hpsf-
- benefit?
- a bit unclear in practical terms
- see also https://hpsf.io/join
- sign of maturity
- access to HPSF funds, members
- support from Linux Foundation w.r.t. legal aspects, etc.
- learn from other projects w.r.t. governance, technical aspects, ...
- would push us to take formal steps w.r.t. governance
- should EasyBuild join?
- risk of HPSF being perceived as US-focused
- could impact us w.r.t. EuroHPC funding
- can EB LF be used as legal entity to apply for funding?
- risk of HPSF being perceived as US-focused
- ask Xavier for input on why Environment Modules is being onboarded
- European HPSF members: CEA, JSC
- mostly in favor: Alan, Kenneth
- less positive: ?
- onboarding process can be a bit time-consuming (cfr. Kokkos)
- next steps
- explore in UGent whether they want to become HPSF Associate Member + onboard EasyBuild as HPSF project (via UGent Tech Transfer?)
- members for initial Steering Committee: Kenneth, Simon, Caspar, Sam, Adam, Sebastian
- shouldn't overlap significantly with EESSI Steering Committee
- initially for 1 year, meeting once every quarter
- short-term tasks:
- write out goals (diversity, geography, steering vs advisory board) for governance
- how to form actual steering committee: timeframe, candidates, voting, spread of interests (sites, EESSI, ...), etc.
- should EasyBuild join HPSF? => write up advice for actual Steering Committee who can decide in 2026
- guard impact of EESSI on EasyBuild, criteria w.r.t. contributions in that context?
- Caspar could give short talk on this at EUM'25
- EasyBuild maintainers active in EESSI: Kenneth, Caspar, Lara, Alan, Bob, (Sam)
- additional EESSI-related PRs
- get mostly handled through people involved with EESSI, so OK
- how many PRs are related to EESSI?
- negative impact of EESSI on EasyBuild?
- effort spent on EESSI takes time away from EasyBuild maintenance work
- has delayed work on EasyBuild 5.0 to some extent
- positive impact
- brings additional use case for EasyBuild, increases relevance of EasyBuild
- effort on EESSI has "trickled down" to EasyBuild (cfr. efforts by Caspar, Lara, Bob)
- brings more attention to EasyBuild
- easyconfig is first step to get software into EESSI
- attention of developers, cfr. recent GROMACS PRs
- broader testing across other architectures (different Intel/AMD generations, Arm, RISC-V)
- funding for EUM'25 group dinners + social event
- risks
- development & maintenance effort becoming too much focused on EESSI
- funding going into EESSI, while EasyBuild was never directly funded
- lots of "in-kind" contributions to EasyBuild
- legal entity via HPSF/LF could be a way of letting EasyBuild benefit from funding for EESSI as an additional "partner"
- retiring non-active maintainers
- active: Kenneth, Lara, Sebastian, Miguel, Simon, Alex, Bob, Jasper, Adam, Sam, Alan, Mikael, Bart, Åke, Caspar, (Balázs)
- no longer active: Damian, Pablo, Fotis, Ward, Davide, Lars
- Damian is still working with EasyBuild and EasyBuild expert @ JSC, Sebastian took over as lead of software team
- will be contacted by Kenneth to retire them as EasyBuild maintainer
- EasyBuild maintainer rights should be removed + membership to maintainers mailing list + Slack channel, docs updated accordingly (cfr. https://docs.easybuild.io/maintainers)
- MotW
- maintainers often focused on stuff directly useful for them (which is fine)
- this died since Sept'24, Kenneth couldn't find time to set up the MotW schedule...
- getting EasyBuild v5.0 out the door + setting up better testing should get priority
- some maintainers prefer only installing stuff from merged PRs
- combined impact of ditching MotW, effort on EasyBuild 5.0 + EESSI leads to easyconfig PRs being longer open
- sort of self-imposed policy (Adam)
- MotW did also help significantly with handling framework + easyblock PRs
- (Alex) did MotW really work that well before EasyBuild 5.0 sucked up a lot of time?
- lots of overhead to plan it, unclear whether people were actually actively acting as MotW during their week
- do we really need someone "at the door" every week?
- (Caspar) does help for some maintainers
- (Alex) regular "stand-up" meetings could be an alternative
- set priorities + assign issues/PRs to someone
- work well for framework/easyblocks in EasyBuild 5.0 sync meetings
- also try to find a volunteer to cater easyconfig PRs?
- on Mondays, flip between 10am CET & 3pm CET to cater to multiple time zones
- set priorities + assign issues/PRs to someone
- (Alan) bring back merge sprints for easyconfigs?
- potential candidates for new maintainers
- cfr. https://docs.easybuild.io/maintainers/#maintainers_criteria
- Alexander Grund (TU Dresden, @Flamefire): probably better as expert contributor, has been asked before as maintainer and then declined
- Jan Reuter (JSC, @Thyre): recent frequent contributor, compiler expert, ...
- Davide Grassano (CECAM, @Crivella): recent expert contributor to EasyBuild & EESSI
- cfr. easyblocks PR #3373
- Maxim Masterov (SURF, @maxim-masterov)
- Cintia Willemyns (VUB, @WilleBell)
- more suggestions?
- KH will look at stats for recent contributors
- delay in getting EasyBuild 5.0 out the door complicates testing
-
generoso
became unavailable - requests to bot to test are taking too long to pick up, which is annoying
- multiple test configurations
- multiple OS configs: Rocky 8/9, other OS
- multiple CPU families: AMD, Intel, Arm
- EESSI vs bare metal
- some configurations should be non-blocking
- in case of failing tests, open issue to keep track of the problem + merge
- also depends on what fails
- keep track of known problems in easyconfig + report when EasyBuild tries to install it
- based on OS, CPU arch.
- revive bot that comments when CI fails
- got broken because of breaking change in GitHub API, Kenneth hasn't been able to find a workaround
- fully automatic build testing of PRs if CI goes green after a PR is merged
- only for trusted contributors?
- only after approved review, to avoid mailicous activity + high-impact mistakes
- risk of prioritizing PRs by frequent/trusted contributors
- sandboxed environment to avoid impact: no write access to existing installations, staged installation (only deploy when PR is merged)
- rule-based checks to check whether a PR can be considered for build testing?
- Cintia @ VUB also seems good potential candidate for EasyBuild maintainer based on stats
- => 4 new potential maintainers
- info for (new) maintainers
- https://github.com/easybuilders/easybuild/wiki/Getting-started-as-EasyBuild-maintainer
- https://github.com/easybuilders/easybuild/wiki/Review-process-for-contributions
- wiki pages should be moved to the docs
- overview of stuff you can change yourself in a PR vs changes to request
- timeout on requested changes + what to do to proceed
- best practices should be documented better
- which easyconfigs to test for easyblock PRs
- take into account https://docs.easybuild.io/policies/toolchains/#current-situation
- take into account complexity of the changes
- which easyconfigs to test for easyblock PRs
- lesson learned, what could we have done better?
- there will be an EasyBuild 6.0 at some point
- with consistent naming for things (effort that was started for EasyBuild 5.0, but was postponed since it would make more delays)
- we're not releasing major version often enough
- "chance of a lifetime" kind of feeling to get breaking changes in
- guaranteeing that deprecated functionality actually keeps working is quite difficult
- 2.0 -> 3.0 was ~1.5y
- 3.0 -> 4.0 was ~3y
- 4.0 was 5+ years ago...
- desired timeline for 6.0?
- ~2 years seems OK => change freeze on 1 Nov 2026, first release of 2027
- consistent naming of things is a clear goal for 6.0
- definitely some scope creep happened while working on EasyBuild 5.0
-
run_shell_cmd
(was planned in April 2023) - rework of module generation: turned out to be more work than anticipated
- effort to make naming of things more consistent was started but then postponed
-
- organising the work better would have helped
- has sort of happened organically in last couple of months
- more helping hands might have helped (but maybe not -> "mythical man month")
- be less afraid of making breaking changes?
- deprecating stuff implies more work
- breaking stuff would annoy a lot more people
-
5.0.x
branch was a necessary evil, but has had a negative impact- extra effort to sync
develop
back into5.0.x
- has also resulted in less time being spent on easyconfig PRs
- collapsing
5.0.x
intodevelop
should have happened way earlier
- extra effort to sync
- a lot of time was spent on EESSI too...
-
a lot of work has been done since we started working on EasyBuild 5.0...
- ~180 PRs in framework repo
- ~240 PRs in easyblocks repo
- ~120 PRs in easyconfigs repo
- => >1 PR per day for > 2 years
- blocked by EasyBuild 5.0
- event-based bot(s), like in EESSI
- or look into using
buildbot
?
- or look into using
- use containers to control build environment
- test matrix (OS/CPU arch/configuration)
- automated test report for these configurations
- tier-1: blocks PRs if testing fails
- (at
jsc-zen3
, zen3 & zen3+a100) most common env: AMD CPUs, Rocky 9, default EasyBuild config (RPATH), Lmod, somewhat minimal container (patch, make, gzip, bzip2, ... no devel pkgs) - (at AWS) Intel CPU, very minimal container (removing
perl
binary, etc.), non-RPATH EasyBuild configuration
- (at
- tier-2: not necessarily blocking PR if test build fails, but open issue
- (at AWS -> Graviton3) EESSI-based: Arm CPU, RPATH, sysroot, filtered deps, alternate module naming scheme (HMNS)
- (at JSC, zen2) fat Ubuntu container (Doxygen, Boost, ...), Environment Modules (Tcl)
- tier-3: no bot, manual triggered tests builds by contributors/maintainers
- PRs for commercial software
- AMD GPUs
- trigger additional tests based on labels
- like
riscv
- like
- also do
--module-only
(run + diff) +--sanity-check-only
- should block PRs
- known to be broken for some easyblocks (like WRF/WPS)
- would be nice to do this in the wake of pre-release EasyBuild 5.0 regression test
- enforce in CI that
modtclfooter
is used ifmodluafooter
is used- with whitelist as escape hatch for complicated cases
- (ask for help from Xavier?)
- inspiration from LLVM talk: https://fosdem.org/2025/schedule/event/fosdem-2025-6644-programming-is-fun-testing-is-needed-infra-is-/
- jsc-zen3 Magic Castle Cluster
- 3 partitions:
- zen3
- zen3+a100
- zen2+v100
- could be expanded with Zen4 very soon, maybe also Intel Cascade Lake
- 3 partitions:
- (not discussed in the call) only deploy installations when PR is merged (build in staging directory via bwrap)
- Clang is starting to "take over" from GCC?
- TensorFlow officially doesn't support GCC anymore, must be Clang
- should we add Clang to the
foss
toolchain, and support opting in to using it for builds (viatoolchainopts
)? - would help to make it possible to perhaps even switch from GCC to Clang as default compilers in
foss
- @Crivella has been experimenting with building everything with Clang
- there's definitely some pain still in there
- do we keep GCCcore still around?
- we probably should, but then we need to be more strict
- already being done for Fortran modules in EasyBuild 5.0 => trouble when
GCCcore
toolchain is used - banned library mechanism we have to catch direct linking to OpenBLAS can help us to avoid installing stuff that requires OpenMP runtime with wrong toolchain
- already being done for Fortran modules in EasyBuild 5.0 => trouble when
- can't mix OpenMP runtime of GCC and Clang (like for Fortran modules)
-
libgomp
in GCC,libomp
in Clang,libiomp
in Intel - only really a problem for libraries linking to OpenMP runtime
- incl. "compiled" Python packages (
.so
)
- incl. "compiled" Python packages (
-
- we probably should, but then we need to be more strict
- first make a fully-fledged Clang-based toolchain as alternative to GCC-based
foss
- to avoid making things to complicated
- building everything with Clang is sort of swimming up stream, may be trouble for everything
- ...
- also: consider switching to BLIS as default BLAS/LAPACK library backend for FlexiBLAS?
- OpenBLAS is a one-man show, not sure we'll be able to keep relying on it
- no LAPACK library provided by BLIS? (cfr. libFLAME)
- netlib LAPACK on top of BLIS is OK?
- libFLAME isn't a full replacement for LAPACK, and we've observed more failing tests
- for single-core stuff, doesn't matter much if you use OpenBLAS/BLIS/MKL
- for multi-threaded stuff, BLIS has an edge, AOCL-BLAS even more
- for Zen4, OpenBLAS still has an edge over BLIS due to AVX-512 kernels (Skylake), and AOCL-BLAS is even better
- which MPI do we go with in Clang-based toolchain?
- MPICH to be closer to vendor toolchains
- would make jump to Clang-based toolchain from
foss
bigger - we don't treat libfabric like we do UCX (for CUDA support, multi-GPU)
- get more organised w.r.t. cleanup of dead code
- dedicated label that allows us to easily find PRs where candidate dead code was touched
- or dedicated issue to keep list of things to revise before next major release of EasyBuild (similar to this)
- highlight this in best practices
- support for using generating module file with multiple module naming schemes
- auto-tagging of PRs
- cfr. Flamefire's idea on tagging PRs with size PRs
- most easyconfig PRs
- maybe more useful for framework/easyblocks
- single-easyblock PR vs multi-easyblock
- generic vs custom easyblocks
- auto-tagging per easyconfig generation
- very useful for cleaning up PRs for old toolchains
- CUDA
- "core" packages (toolchain components, Rust/Python/Perl, CMake/Boost/X11, SciPy-bundle, etc.)
- getting those PRs merged quickly would be helpful
- PRs that only add extensions to bundles
- stale PRs
- no activity in N weeks
- CI failing
- as replacement for boegelbot not being able to anymore (or can it)
- "waiting-for-reviewer", "waiting-for-contributor" style labels
- "waiting-for-contributor" after review with requested changes
- "waiting-for-reviewer" if extra commits have been pushed
- "fresh-PR" or "review-required" if there hasn't been activity after opening of PR
- "tested-by-bot" if bot has tested it
- "tests-OK" if there are only successful test builds
- "trivial-version-bump"
- cfr. Flamefire's idea on tagging PRs with size PRs
- maybe physical/hybrid, coincide with FOSDEM'26 in Brussels?
- yearly, 2nd week of February, so:
- Mon 9 Feb 2026 14:00-16:30 CET
- Thu 12 Feb 2026 14:00-16:30 CET