Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Develop BEE Resource Monitor #156

Closed
guanxyz opened this issue May 4, 2020 · 10 comments
Closed

Develop BEE Resource Monitor #156

guanxyz opened this issue May 4, 2020 · 10 comments
Assignees

Comments

@guanxyz
Copy link
Contributor

guanxyz commented May 4, 2020

Creating an issue for discussing and summarizing the design spec (combining #115 and #98).

Generally problem we are looking to solve:
Workflow_01

** General BEE
Untitled (Draft)-3

The overall system we are looking at (MARS is our scheduler)
Policy

General functions
System-Overview2

TM-RM: Interface between TM and RM (SLURM, AWS, GCP, et al)

  • Mapper of SLURM Commands: sinfo, squeue, srun, .... (need verifications for which commands should be added). Currently considering of pyslurm. More details should be defined: "sinfo -> bee-sinfo"
    arch

Discussion will continue....

@jtronge
Copy link
Collaborator

jtronge commented May 27, 2020

Here is a possible rough draft of the REST interface between the TM and the Resource Manager:

/jobs - list all current jobs
/jobs/<id> - get information about a current job
/jobs/<id>/allocation - get the recommended allocation for a specific job
/nodes - list all nodes
/nodes/<id> - list information for a specific node

When the task manager needs to submit a job it would be able to look into its current queue of jobs, and then send each of these over to the Resource Manager through a POST /jobs request.

The Resource Manager would then run the current scheduling algorithm and choose a specific node or set of nodes to run the task/job on. The Task Manager could then access this by a call GET /jobs/<id>/allocation and use this allocation information to send a job over to SLURM or whichever resource manager is in use.

Note: Somehow the Task Manager, or perhaps the Workflow Manager, will need to make sure that the Resource Manager has an accurate list of nodes. This will need to be done by a call POST /nodes

This is not an overly comprehensive design of how the interface could work but just a starting point. I also may be misunderstanding how some parts of BEE will need to interact so please correct me if any of this seems wrong.

@pagrubel
Copy link
Collaborator

It seems like atleast for slurm you are trying to do what slurm already does, as far as picking which nodes to allocate.

@jtronge
Copy link
Collaborator

jtronge commented May 27, 2020

You're right Pat.
My understanding was that this would be an intermediate-level scheduler which would allow for the use of different scheduling algorithms (more than what slurm has to offer) that could be used with any of the resource managers that BEE supports.

@pagrubel
Copy link
Collaborator

pagrubel commented May 27, 2020 via email

@guanxyz
Copy link
Contributor Author

guanxyz commented May 27, 2020

@pagrubel, you are right. Jake's current simple design may provide limited benefits in managing the jobs in ONE resource. But for extensibility and portability, it will give long term support. For making decisions on which nodes to take the jobs, my suggestion is that we may need it for the following reasons:

  • Theoretically we could run multiple containers on a single node. Some workflow applications ' tasks could be hosted together.

  • We may want to avoid to assign the containerized tasks to any "unhealthy" nodes.

@guanxyz
Copy link
Contributor Author

guanxyz commented May 27, 2020

I am recommending to use Swagger to edit the APIs. https://editor.swagger.io/

@jtronge
Copy link
Collaborator

jtronge commented May 29, 2020

I have converted my rough API design into a Swagger document here: scheduler.yaml

@rstyd
Copy link
Collaborator

rstyd commented Apr 12, 2021

I'm going to close #97 and rename this issue to Develop BEE Resource Monitor.

@rstyd rstyd changed the title Interface Design and Spec (TM to Resource Manager) Develop BEE Resource Monitor Apr 12, 2021
@rstyd
Copy link
Collaborator

rstyd commented Sep 9, 2024

This issue no longer fits BEE's scope. Closing pending @pagrubel evaluation

@rstyd rstyd closed this as completed Sep 9, 2024
@rstyd rstyd reopened this Sep 9, 2024
@pagrubel
Copy link
Collaborator

pagrubel commented Sep 9, 2024

I hate to take this off the list of things to do. In some ways it does fit a larger future vision. We can close it with a caveat to revisit it if BEE is able to run across platforms at multiple sites.

@pagrubel pagrubel closed this as completed Sep 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants