-
Notifications
You must be signed in to change notification settings - Fork 3
Meeting Notes April 2020
Attendees - Rusty Davis (rstyd), Pat Grubel (pagrubel), Tim Randles (trandles-lanl), Jake Tronge (jtronge)
Agenda
PR Review
- NONE
Issue Review
- 146 - Pat has a fix. Will just check for expected filenames and error out for anything else.
Discussion (ToDo?)
- Quincy Wofford starting May 18
- Steven starting June 15
- Things we need to define
- (TM) Abstract interface to HPC resource managers (worker) (Slurm, LSF, Torque, PBS, etc.)
- (TM) Abstract interface to container runtimes
- (TM) Does TM call container interface to build body of job script and then pass that to the resource manager interface, or does the resource manager interface (worker) call the container interface when building a job script?
- (WfM/TM) how to run via WSGI (web server gateway interface), usually Nginx or uWSGI
- (WfM) Abstract interface to placement engine
- (WfM) Resource monitor - what is it, how does it work
- Add ability for parser to handle return codes (https://www.commonwl.org/v1.1/CommandLineTool.html#Execution)
Around the room
- Pat
- container filename extensions
- container runtime options
- Rusty
- pytest continuing
- drafting document on possible issues and mitigation strategies
- primarily between WfM and TM
- Jake
- BEEStart
- looking at logging
- BEEStart
- Tim
Attendees - Pat Grubel (pagrubel), Qiang Guan, Al McPherson (mcpherson), Tim Randles (trandles-lanl), Jake Tronge (jtronge)
Agenda
PR Review
-
#150 (trandles-lanl) - Beeconfig 149
- pagrubel will test and comment
Issue Review
- #139 (trandles-lanl) - BEEStart: A script to start BEE components
Discussion (ToDo?)
- NONE
Around the room
- Pat
- back on BEE this week
- Al
-
cwltool
investigations - waiting on VASP container from trandles
-
- Qiang
- Jake
- talk to trandles-lanl about BEEStart
- Tim
NO MEETING
Attendees - Rusty David (rstyd), Pat Grubel (pagrubel), Qiang Guan, Al McPherson (mcpherson), Tim Randles (trandles-lanl), Jake Tronge (jtronge)
Agenda
PR Review
- NONE
Issue Review
- NONE
Discussion (ToDo?)
- FY21 ECP activities off to L3 for approval, can add more later
- See what OSC is using for a resource manager (qguan)
- Time to define logging standard for BEE (running of the system, not necessarily workflow-specific stuff)
- Start list of what goes into the graph database (e.g. job script as metadata on task node)
Around the room
- Pat
- Al
- working on database refactor
- waiting on trandles-lanl to get VASP container then will write CWL for Sven's workflow
- thinking about parser strategy (maybe a hack of
cwltool
) - will email someone about
cwltool
- Qiang
- almost finished paper describing scheduling algorithms
- continue discussion of tasks for Jake - container integration (discuss Wednesday)
- Jake
- looking at issue 124 (task status reporting)
- job-building/script-building to test individual commands in job for success/failure
- Tim
- wrapping up basic
BEEStart
to push to repo - planning activities with Qiang
- wrapping up basic
- Rusty
- wrapping up
pytest
activities - refine REST APIs
- wrapping up
Attendees - Rusty Davis (rstyd), Pat Grubel (pagrubel), Qiang Guan, Al McPherson (mcpherson), Tim Randles (trandles-lanl), Jake Tronge (jtronge)
PR Review
-
#143 (pagrubel) - Fix slurm unit tests
- rstyd and trandles-lanl will run tests to confirm, if pass then approve
Issue Review
-
#144 (trandles-lanl, mcpherson) - Create VASP Charliecloud container
- mcpherson to review past emails with srudin and comment on issue
Discussion
- BEE docker image - jtronge
- README.md on mattermost chat describing use
- works at Kent
- fedora image
- trandles-lanl, pagrubel, mcpherson will test it, provide feedback to jtronge
- WoWoHa - pagrubel
- WoWoHa 2020 cancelled
- will be a weekly "summer seminar series" June - August 2020
- BEE will give a talk
- pushing to master and public BEE repo - pagrubel
- getting closer to public release
- need to define criteria for first release (documentation, workflow limitations/supported CWL, etc.)
- trandles-lanl will create milestone issue for first public release - target end of FY
- trandles-lanl will create issue for supporting MPI applications using Charliecloud and BEE
Around the room
- Rusty
- pytest vs. unittest
- can run unittest with pytest
- pytest has a flask plugin, good support
- pytest has better test output, works with doctest (see https://vincent.bernat.ch/en/blog/2019-sustainable-python-script)
- Pat
- jtronge test PR #143
- Issue #124 - jtronge discuss with pagrubel
- Al
- working on database refactor
- chasing down VASP stuff
- Qiang
- tasks for jtronge
- discuss FY activities with trandles-lanl
- Jake
- Tim
- push BEEStart ASAP and let others hack on it
Attendees - Pat Grubel (pagrubel), Qiang Guan, Al McPherson (mcpherson), Tim Randles (trandles-lanl), Jake Tronge (jtronge)
Agenda
PR Review
- NONE
Issue Review
- NONE
Discussion
- Thoughts on FY21 cloud milestone
- using ORNL or Chameleon cloud for target platform
Around the room
- Pat
- working to get pyslurm tests running
- using DockerRequirement from CWL
- Al
- getting on darwin and fog
- Qiang & Jake
- got examples running that were in milestone documentation
- Jake will document his scripts and dockerfiles for setting up their test environment
- Jake will get things running on group server
- Qiang to send thoughts on FY21 cloud milestone
- Tim
- continue working on BEEStart script
Attendees - Rusty Davis (rstyd), Pat Grubel (pagrubel), Al McPherson (mcpherson), Tim Randles (trandles-lanl), Jake Tronge (jtronge)
Agenda
- discuss CWL and container support
- TaskManager design for modular support of container runtimes and resource managers
- discuss proposed FY21 ECP P6 Activites
- BEE- FY21 P6-1 Develop the ability to archive, clone, and re-run workflows (start 10/01/20, due 3/31/21)
- BEE- FY21 P6-2 Run BEE jobs on private cloud infrastructure (due 9/31/21)
PR Review
- NONE
Issue Review
- NONE
Discussion
- APPROVED April 6, 2020 meeting notes
- TaskManager discussion mostly shelved for now, revisit next week
- CWL support for containers
- defined at https://www.commonwl.org/v1.1/CommandLineTool.html#DockerRequirement
- how to handle bind mounts, inputs/outputs, etc.?
- maybe a question for the CWL mailing list
- Rusty looking for existing containerized CWL workflows for examples
- "standard" container runtime options in the bee.conf file?
- FY21 ECP Activities are documented at
- Tim starting on design document for the activities
Around the room
- Jake
- neo4j issues (Task already exists)
- Rusty knows how to fix itj
- close to being able to run test workflows
- Rusty
- starting test work
- looking at PyTest for integration testing
- maybe pexpect for client testing
- Flask has some testing framework (Jake)
- BEE should start a document of what CWL is supported by project
- Pat
- question for Rusty about passing Task object to worker from TaskManager
- will need to think about how to pass things around when there's more data (requirements and hints)
- Al
- refactoring database and building new API to it
- no way to version python APIs
- API changes only affect WorkflowManager
- next use case CWL example
- maybe BLAST workflow again
- keep scope of parsing to HPC use cases, not "generic everything CWL"
- Do srudin VASP workflow (parameter study) #66
- refactoring database and building new API to it
Action Items
- Tim - get VASP containers that work with Charliecloud (Power9, x86_64)
Attendees - Rusty Davis (rstyd), Pat Grubel (pagrubel), Qiang Guan (guanxyz), Tim Randles (trandles-lanl), Jake Tronge (jtronge)
PR Review
-
#138 APPROVED (trandles-lanl) - Use bee.conf to configure listen ports for BEEWorkflowManager and BEETaskManager
- Pat approves of merging this PR, but into
master
instead ofdevelop
. The rationale is the functionality is simple and enables everyone to do development work at the same time on the same system.
- Pat approves of merging this PR, but into
Issue Review
-
#137 (pagrubel) - Slurm worker to properly check DockerRequirment
-
slurm_worker.py should use the
DockerRequirement: dockerImageId
specified in the CWL file
-
slurm_worker.py should use the
Discussion
- extending CWL for other container runtimes (rstyd)
- discuss on Wednesday
- guanxyz had some ideas
- next ECP milestones up on wiki
Around the room
- Jake
- got a test environment set up at KSU
- initial problems with PySlurm due to having a too-new Slurm installed
- Rusty
- working on
unittest
and CI tests for client/WorkflowManager - not a lot of time for BEE this week (very understandable, everyone prioritized BEE the past 2 week (trandles-lanl))
- working on
- Pat
-
unittest
for TaskManager - issue #137 above
- not much time for BEE this week
-
- Tim
- issue #139 planning to discuss on Wednesday
- ECP milestone housekeeping