-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Develop BEE Resource Monitor #156
Comments
Here is a possible rough draft of the REST interface between the TM and the Resource Manager:
When the task manager needs to submit a job it would be able to look into its current queue of jobs, and then send each of these over to the Resource Manager through a The Resource Manager would then run the current scheduling algorithm and choose a specific node or set of nodes to run the task/job on. The Task Manager could then access this by a call Note: Somehow the Task Manager, or perhaps the Workflow Manager, will need to make sure that the Resource Manager has an accurate list of nodes. This will need to be done by a call This is not an overly comprehensive design of how the interface could work but just a starting point. I also may be misunderstanding how some parts of BEE will need to interact so please correct me if any of this seems wrong. |
It seems like atleast for slurm you are trying to do what slurm already does, as far as picking which nodes to allocate. |
You're right Pat. |
We need to define what is needed. I thought this scheduler eventually would help decide where tasks should be submitted, maybe based on things like it will take too long to get started on one resource so use another. So while we only have one resource right now that could be something like send messages showing when we might expect the job to start or even change the number of nodes etc. I don’t think we should be in the business to decide which nodes to use. I’ll have to think about this more.
|
@pagrubel, you are right. Jake's current simple design may provide limited benefits in managing the jobs in ONE resource. But for extensibility and portability, it will give long term support. For making decisions on which nodes to take the jobs, my suggestion is that we may need it for the following reasons:
|
I am recommending to use Swagger to edit the APIs. https://editor.swagger.io/ |
I have converted my rough API design into a Swagger document here: scheduler.yaml |
I'm going to close #97 and rename this issue to Develop BEE Resource Monitor. |
This issue no longer fits BEE's scope. Closing pending @pagrubel evaluation |
I hate to take this off the list of things to do. In some ways it does fit a larger future vision. We can close it with a caveat to revisit it if BEE is able to run across platforms at multiple sites. |
Creating an issue for discussing and summarizing the design spec (combining #115 and #98).
Generally problem we are looking to solve:
** General BEE
The overall system we are looking at (MARS is our scheduler)
General functions
TM-RM: Interface between TM and RM (SLURM, AWS, GCP, et al)
Discussion will continue....
The text was updated successfully, but these errors were encountered: