Feature List: TOSS4: 2020 Q2

Jump to bottom Edit New page

Stephen Herbein edited this page Jun 14, 2019 · 3 revisions

What We Will Provide

System Instance

Limit KVS Content Growth
- Garbage Collect after Restart
Tolerate Compute Nodes Down
Drain Nodes
Detect/Monitor Nodes/Resources Up/Down
User Management
- Admin role
- Dynamically add/remove users (watch /etc/passwd?)
Configuration files

Execution System

I/O to/from files with file per process
Multi-prog support (MPMD)
pty support
affinity/mapping
Jobspec + R -> local
Environment
Debugger support
- MPIR
- Distributed Sync
- Co-locating processes
Launch OpenMPI 3.1+
PMI
job completion log
- simple append interface
- offline & online query (x-post w/ porcelain)
real job shell
signal jobs (x-post w/ porcelain)

Job Submission

Job Priorities (x-post w/ bank/accounting)
Job Dependencies
Job Feasibility
- Ingest plugin to ensure job request is not larger than cluster can provide
- Job request abides by QoS limits

Resource Management

Query available/allocated/down resources (x-post w/ porcelain)
Resource configuration language
Resource discovery vs config file
Connect to WhatsUp
- Provide kvs key with idset of "up" nodes

Porcelain

List jobs in queue order with filtering
Run/submit
scheduler front-end work
alter job priorities
- hold
- cancel
- expediate
query completed jobs (x-post w/ execution system)
Transition Tools
- flux srun
signal jobs (x-post w/ execution system)
Resource status summary tool (x-post w/ resource management)
User guides for transitions to Flux commands

Bank/Accounting

Specify bank on submission
Tools/storage for EOY analysis
User permissions
Fair-Share Scheduling
Job Priorities (x-post w/ job submission)
Slurm Database

Resource Matching Integration w/ Exec System

Resource matching interfaces w/ new exec system
Scheduler ? support

Sched Optimization and Resiliency

Scheduler performance optimization
Scheduler resiliency improvements
- Support unload/load via job manager
Scheduler memory optimization
Planner optimization

Support for Queues & Partitions

Queue Equivalent (e.g., job tags)
- W/ policy support (e.g., wall time limit)

ATDM L2 Milestones

Power Monitoring
- monitoring support for job-level power/perf data
- from various databases
Tools Interface
Storage ???
- Burst Buffer support w/in simulator
- Add stage-in/out support in jobspec
- Data staging flux module
GPU

Security

IMP + Contain
IMP PAM Support
IMP Prolog/Epilog support

What We Will NOT Initially Provide

Fully-baked, bulletproof resiliency
- Node loss within a job allocation will result in job failure
- Crash/loss of management node will result in running jobs (i.e., they will be killed)
Scheduling
- Resources besides nodes/cores/gpus
- Standby Jobs
- Pre-emption
- Email Notification
- Job Requeue
- Modifying job properties post-submission (e.g., walltime, num nodes/cores, queue)
- Providing "reasons" for job not currently running