-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Internal queues: task in multiple queues behaviour #2701
Comments
Should we implement this as described, or the more usual override behaviour: if a task is a member of two queues, the second assignment overrides the first? |
Should I change the code for the task to be queued to smaller value or the later assignments? |
I vote for override semantics, not smaller value. Let's see if @matthewrmshin agrees ... |
How do we do override if a task is in two families which are in different queues? |
Family names can be used as shorthand to assign member tasks to queues, but other than that I don't think families are relevant here. So it is the order of queue assignment that matters. Below, task
I can understand the safety argument, but it seems rather unorthodox to make decisions like that for users. Normally it is down to you, if you want to assign a new value to something, the new value should stick. Does that make sense? |
@paulcresswell do you want to comment on this? |
Yes, thanks. The problem with the "usual" override behaviour is that a task may need to inherit other families As an alternative suggestion then, could we consider allowing a task to be a member of multiple queues? E.g. in Hilary's example, if |
@paulcresswell - I"m having a bit of trouble getting my head around your argument. Are you suggesting that the order of inheritance of families (A, B vs B, A) implies (or should imply) order of queue assignment, when families - rather than individual tasks - are assigned to queues? Note that queue membership is not a property of a family, in the runtime hierarchy - which is why queues are defined under "scheduling" and not "runtime". As I said above, use of a family name in queue definition is merely as convenient shorthand for "all the family member tasks". Using your example (I think):
Under
It makes no difference whether What matters is that Maybe we just need to document exactly what queue membership assignment by family name means, and warn to be careful about tasks that inherit from multiple families?? |
(I still need to think about your "alternative suggestion" ... is it better than simpler parse-time assignment to the single lower-limit queue?) |
@matthewrmshin @paulcresswell - @challurip is interested in working on this, but we need to come to some resolution on the discussion above! |
Perhaps we could allow for tasks to be in multiple queues, although inevitably it ups the complexity.. One solution would be for Cylc to create internal queues (i.e. qC) whose members (moved from qA & qB) are the intersections of and, while running, occupies 1 slot of each of the user defined queues.. The intersection queues go first. Where there are overlaps, the highest intersectionality (number of component queues) gets priority (no preference on equal number (perhaps queue size)).. Just a thought (^.^); |
I misunderstood your earlier explanation Hilary, which is why you couldn't get your head round my reply; sorry. Thinking it over cleanly, my preferred option is the 'multiple queues' method because it avoids unnecessarily holding up tasks that could run, but I appreciate that's a much more complex thing to implement. Ultimately I think I and anyone else could cope with any of the methods described once they were done and documented; by rearranging the order the queues are defined in if necessary (smallest last!). The only thing I'd be averse to is any solution that meant the order of family inheritance controlled which queue something ended up in. (I do wonder how many users don't appreciate the difference between scheduling overrides and runtime order of inheritance though - I certainly didn't when I asked this question. It's not something most of us come across as frequently, I think.) |
As discussed off line... The purpose of a queue is to limit access to a given resource. If we have queues to control access to multiple resources, we have an issue when tasks are in both queues, and I have no good answer on how to solve the potential race condition. |
I think allowing one task in multiple queues only make things overly complicated. I wonder if the root cause of this issue isn't simply because of conflating unrelated settings into one family, which then forces unwanted queue membership. Let's say you have a setup like this: [scheduling]
[[queues]]
[[[QA]]]
limit = X
members = FAM_QA
[[[QB]]]
limit = Y
members = FAM_QB
[runtime]
[[FAM_QA]]
[[FAM_QB]] In the above, any task inheriting from FAM_QX will end up queued in QX. Since the 'queue families' do not carry any settings, there is no need for a task to inherit from more than one such family. While that could technically still happen, it should be treated as a suite design flaw. That is, my preferred solution would be to get an error instead of a warning when this happens, as it is a sign that the family inheritance in the suite is not quite right. This, combined with a paragraph or two of advice in CUG on best practice with regard to queues and inheritance would be enough IMO. Or perhaps even more general advice on correct design of inheritance trees, as there are also other pitfalls[*]. [*] One such pitfall is this: [runtime]
[[FAM]]
[[foo]]
inherit = FAM
[[bar]]
inherit = None, FAM FAM will show up in GUI as a level of task grouping, but selecting this group and, say, succeeding it will also succeed the bar task due to inheritance, even though this task was not explicitly selected. To avoid this, if a given family is first parent to some task, then it should not be used as a secondary parent to another task (I would even argue that this should fail validation). |
I forgot to record this latest off line conversation at the work shop. Consider 2 queues with a common task Q1: [t1, t2, tX, t3]
Q2: [t4, tX, t6, t7] We'll allow other tasks to jump queue until the common task is at the head of the queue, so we may have: Q1: [t1, t2, tX, t3]
Q2: [tX, t6, t7]
# ...
Q1: [t2, tX, t3]
Q2: [tX, t7]
# ...
Q1: [tX, t3]
Q2: [tX, t7]
# ... no more queue jumping until tX is submitted This logic works as long as we don't have something ridiculous like this: Q1: [tX, tY, tZ]
Q2: [tY, tZ, tX]
Q3: [tZ, tX, tY] |
Assuming each task is added to queues atomically, the relative order of tasks should be preserved under the current FIFO semantics. By if queues are ever extended to allow setting per queue priorities and reordering, such a situation is certainly possible. |
Just to throw another idea into the mix, and another use case as way of examples. Say in my suite I want to limit the number of jobs sending data to external locations
But, say also that some servers are particularly slow and struggle when more than a couple of processes are sending them data at a time, so we want to limit them.
The tasks sending data to the SLOW_SERVER should still be limited by the SEND_DATA queue, as they are still also sending data externally. But, they should also be limited by the SLOW_SERVER queue. Whilst I do generally agree with @TomekTrzeciak in #2701 (comment) I do not think the use case above is covered by that discussion point. This is where a multiple limits would be useful, in my opinion. Perhaps another way of looking at this would be for queues to be sub-queues within a parent queue? Child queues would accept jobs, and then submit them add them into the parent queue, clearing out when they clear out of the parent queue. This approach might be easier to track mentally.
But in this case, if a task was by chance in both queues, it would end up in the lowest child queue, and only indirectly be in the parent queue. Task gets added to SLOW_SERVER. That then adds it up the chain into the SEND_DATA queue. When SEND_DATA actually runs it, it clears it out of SEND_DATA and then out of SLOW_SERVER. |
(For the record, I had a working implementation of overlapping queues, with no race conditions, in #4035 - but backed out due to performance concerns plus the need to convince others that overlapping queues are a good thing 😁 ) We hope to investigate other queuing implementations again soon, however). |
From @paulcresswell via email:
See also #1196 #2616.
The text was updated successfully, but these errors were encountered: