How to handle clusters that are node-exclusive rather than core-exclusive? #3143
-
In LC, we typically have two kinds of systems: node-exclusive and core-exclusive. The difference can be summarized with the answer to "what happens when a user asks for a single task and a single core?". In a node-exclusive cluster, the user gets the whole node; in a core-exclusive cluster, the user gets a single core on a single node that they share with other users. Right now, with the current combination of jobspec V1, One solution that was discussed during coffee hour was to add a new configuration key/value that would turn on node-exclusive allocations. When that configuration is set:
This solution requires no modifications to the Fluxion scheduler. One open question would be what the default Since there are potentially many different ways to handle this problem, we decided to start with a discussion until we coalesce around a solution, then we can either open an issue or convert this discussion to one (if that's allowed). |
Beta Was this translation helpful? Give feedback.
Replies: 4 comments 7 replies
-
On the call we also discussed that it would be nice if jobspec could be modified after user submission. Since the jobspec is signed by the user this idea was discarded out of hand. However, it occurs to me that there is nothing in the security architecture of Flux that requires the scheduler to use the unmodified resources section of user-submitted jobspec when finding a matching resource set. The scheduler could therefore internally modify the request as above when configuration dictates, and use the modified jobspec in its matching policy. A problem with this approach is that the job shell currently simply takes R and the tasks section of jobspec to determine the number of tasks to launch. On a node-exclusive cluster, a jobspec generated with In general though, the job shell should be smarter than this. A scheduler only has to match at a minimum the requested resources, and assuming that the user wanted to run exactly as many tasks as there are slots in the final resource set assignment will be error prone. |
Beta Was this translation helpful? Give feedback.
-
Great point!
However, any scheduler may allocate resources at a minimum, so users should
expect different behavior depending on the loaded scheduler and it's
configuration.
The nice thing about Flux is that you can at least have a hope of
consistent behavior by launching your own instance and using a scheduler of
your choice with known configuration. Most workflows would be submitted to
a single user instance anyway, not the system instance right?
I'm not necessarily saying one approach is better than the other, but just
something to think about. I don't like the idea of giving an enclosing
instance an ability to modify jobspec right before it is signed by the
user. That gives me pause.
…On Fri, Aug 14, 2020, 8:41 PM Stephen Herbein ***@***.***> wrote:
However, it occurs to me that there is nothing in the security
architecture of Flux that requires the scheduler to use the unmodified
resources section of user-submitted jobspec when finding a matching
resource set. The scheduler could therefore internally modify the request
as above when configuration dictates, and use the modified jobspec in its
matching policy.
I like that this would make these issues transparent to the user, but I'm
also worried that it could be surprising. In particular, I think it would
be confusing if users are feeding in their "blessed" campaign/workflow
jobspec and getting different resource allocations depending on the system.
I guess ultimately there is a trade-off between minimal user intervention
and consistent semantics.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#3143 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAFVEUX5T7ZWHN2HKZ4P72LSAX7XDANCNFSM4P75MRSA>
.
|
Beta Was this translation helpful? Give feedback.
-
The folks who are working on the CTS-2 procurement are wondering if we'll be able to have node scheduled and core scheduled queues / partitions on the same cluster. Is there anything about the approach that was discussed here that would get in the way of that? |
Beta Was this translation helpful? Give feedback.
-
We have a solution for Fluxion provided by flux-framework/flux-sched#900. The key parameter is to configure the [sched-fluxion-resource]
match-policy = "lonodex" See flux-config-sched-fluxion-resource(5) for more details. The one remaining question was posed by @ryanday36 above:
Since the match-policy for the resource module is configured as a whole, I don't think there is a way to do this with the current solution. Fluxion would need to be extended to allow the qmanager or resource to select the match-policy based on queue or some other parameter. Perhaps properties could be used for this purpose (#4143). cc: @dongahn |
Beta Was this translation helpful? Give feedback.
We have a solution for Fluxion provided by flux-framework/flux-sched#900. The key parameter is to configure the
[sched-fluxion-resource]
table withSee flux-config-sched-fluxion-resource(5) for more details.
The one remaining question was posed by @ryanday36 above:
Since the match-policy for the resource module is configured as a whole, I don't think there is a way to do this with …