Internal queues: task in multiple queues behaviour #2701

matthewrmshin · 2018-06-22T08:42:12Z

We've noticed that if a task appears in two queues (other than the default queue), it adopts the queue limit of the first one it hits (the suite logs give warnings about ignoring subsequent queues). Is this really desirable? I would have thought that if a user defined two queues, both being picked up through family inheritance, a task should adhere to the smaller of the two values for safety.

See also #1196 #2616.

hjoliver · 2018-07-05T22:59:23Z

a task should adhere to the smaller of the two values for safety.

Should we implement this as described, or the more usual override behaviour: if a task is a member of two queues, the second assignment overrides the first?

challurip · 2018-09-06T00:34:30Z

Should I change the code for the task to be queued to smaller value or the later assignments?

hjoliver · 2018-09-06T00:42:12Z

I vote for override semantics, not smaller value. Let's see if @matthewrmshin agrees ...

matthewrmshin · 2018-09-06T06:11:12Z

How do we do override if a task is in two families which are in different queues?

hjoliver · 2018-09-06T19:50:29Z

Family names can be used as shorthand to assign member tasks to queues, but other than that I don't think families are relevant here. So it is the order of queue assignment that matters.

Below, task x is is being assigned to both queues qA and qB (and it doesn't particularly matter if it got there as written or via family names). Currently x gets assigned to qA, and the second assignment is ignored with a warning. For this issue, if order matters x should end up in qB (2nd assignment overrides 1st); or if "safety" matters, it should end up in qA (lower limit).

[scheduling]
   [[queues]]
      [[[qA]]]
          members = f1, f2, f3, x
          limit = 1
     [[[qB]]]
          members = a1, a2, x
          limit = 2

I can understand the safety argument, but it seems rather unorthodox to make decisions like that for users. Normally it is down to you, if you want to assign a new value to something, the new value should stick.

Does that make sense?

matthewrmshin · 2018-09-10T08:48:25Z

@paulcresswell do you want to comment on this?

paulcresswell · 2018-09-11T11:49:35Z

Yes, thanks. The problem with the "usual" override behaviour is that a task may need to inherit other families A and B in a particular order, which themselves are members of qA and qB respectively, which may then force the task to be in one queue when the user intended it to be in another. This is a very hard thing to notice when you get it wrong, as the task will still run but may cause system problems unbeknown to the user. Consequently my vote would be to take the smaller value, but I can understand the desire not to have special rules (so it'd need to be clearly documented).

As an alternative suggestion then, could we consider allowing a task to be a member of multiple queues? E.g. in Hilary's example, if qA and qB both currently contain one running task, then task x would not be allowed to start (as this would exceed qA's limit). However, if task x was the only running task in either queue, then task b in queue qB would be allowed to start, but task a in queue qA would not. That would allow the maximum possible throughput without potentially dangerous behaviour.

hjoliver · 2018-09-12T14:00:30Z

@paulcresswell - I"m having a bit of trouble getting my head around your argument. Are you suggesting that the order of inheritance of families (A, B vs B, A) implies (or should imply) order of queue assignment, when families - rather than individual tasks - are assigned to queues? Note that queue membership is not a property of a family, in the runtime hierarchy - which is why queues are defined under "scheduling" and not "runtime". As I said above, use of a family name in queue definition is merely as convenient shorthand for "all the family member tasks". Using your example (I think):

[scheduling]
  [[queues]]
      [[[qA]]]
          members = A
          limit = 1
      [[[qB]]]
          members = B
          limit = 2
  [[dependencies]]
        graph = foo
[runtime]
  [[A]]
  [[B]]
  [[f1, f2, f3]]
    inherit = A
  [[a1, a2]]
    inherit = B
  [[x]]
    inherit = B, A

Under queues the family names expand to list of members, so the above is equivalent to this:

   [[[qA]]]
       members = f1, f2, f3, x
       limit = 1
    [[[qB]]]
       members = a1, a2, x
       limit = 2

It makes no difference whether x inherits from A, B or from B, A.

What matters is that x is assigned to qA first, and qB second (simply because qA appears first in the suite definition) so I would argue that x should end up as a member of qB (by "usual" override semantics).

Maybe we just need to document exactly what queue membership assignment by family name means, and warn to be careful about tasks that inherit from multiple families??

hjoliver · 2018-09-12T14:17:37Z

(I still need to think about your "alternative suggestion" ... is it better than simpler parse-time assignment to the single lower-limit queue?)

hjoliver · 2018-11-21T22:53:59Z

@matthewrmshin @paulcresswell - @challurip is interested in working on this, but we need to come to some resolution on the discussion above!

dwsutherland · 2018-11-23T03:06:20Z

Perhaps we could allow for tasks to be in multiple queues, although inevitably it ups the complexity..

One solution would be for Cylc to create internal queues (i.e. qC) whose members (moved from qA & qB) are the intersections of and, while running, occupies 1 slot of each of the user defined queues..

The intersection queues go first. Where there are overlaps, the highest intersectionality (number of component queues) gets priority (no preference on equal number (perhaps queue size))..

Just a thought (^.^);

paulcresswell · 2018-11-23T17:37:25Z

I misunderstood your earlier explanation Hilary, which is why you couldn't get your head round my reply; sorry. Thinking it over cleanly, my preferred option is the 'multiple queues' method because it avoids unnecessarily holding up tasks that could run, but I appreciate that's a much more complex thing to implement. Ultimately I think I and anyone else could cope with any of the methods described once they were done and documented; by rearranging the order the queues are defined in if necessary (smallest last!). The only thing I'd be averse to is any solution that meant the order of family inheritance controlled which queue something ended up in.

(I do wonder how many users don't appreciate the difference between scheduling overrides and runtime order of inheritance though - I certainly didn't when I asked this question. It's not something most of us come across as frequently, I think.)

matthewrmshin · 2018-12-06T02:17:52Z

As discussed off line... The purpose of a queue is to limit access to a given resource. If we have queues to control access to multiple resources, we have an issue when tasks are in both queues, and I have no good answer on how to solve the potential race condition.

TomekTrzeciak · 2018-12-10T22:09:52Z

I think allowing one task in multiple queues only make things overly complicated. I wonder if the root cause of this issue isn't simply because of conflating unrelated settings into one family, which then forces unwanted queue membership. Let's say you have a setup like this:

[scheduling]
  [[queues]]
    [[[QA]]]
      limit = X
      members = FAM_QA
    [[[QB]]]
      limit = Y
      members = FAM_QB
[runtime]
  [[FAM_QA]]
  [[FAM_QB]]

In the above, any task inheriting from FAM_QX will end up queued in QX. Since the 'queue families' do not carry any settings, there is no need for a task to inherit from more than one such family. While that could technically still happen, it should be treated as a suite design flaw. That is, my preferred solution would be to get an error instead of a warning when this happens, as it is a sign that the family inheritance in the suite is not quite right. This, combined with a paragraph or two of advice in CUG on best practice with regard to queues and inheritance would be enough IMO. Or perhaps even more general advice on correct design of inheritance trees, as there are also other pitfalls[*].

[*] One such pitfall is this:

[runtime]
  [[FAM]]
  [[foo]]
    inherit = FAM
  [[bar]]
    inherit = None, FAM

FAM will show up in GUI as a level of task grouping, but selecting this group and, say, succeeding it will also succeed the bar task due to inheritance, even though this task was not explicitly selected. To avoid this, if a given family is first parent to some task, then it should not be used as a secondary parent to another task (I would even argue that this should fail validation).

matthewrmshin · 2018-12-18T10:00:32Z

I forgot to record this latest off line conversation at the work shop. Consider 2 queues with a common task tX:

Q1: [t1, t2, tX, t3]
Q2: [t4, tX, t6, t7]

We'll allow other tasks to jump queue until the common task is at the head of the queue, so we may have:

Q1: [t1, t2, tX, t3]
Q2: [tX, t6, t7]
# ...
Q1: [t2, tX, t3]
Q2: [tX, t7]
# ...
Q1: [tX, t3]
Q2: [tX, t7]
# ... no more queue jumping until tX is submitted

This logic works as long as we don't have something ridiculous like this:

Q1: [tX, tY, tZ]
Q2: [tY, tZ, tX]
Q3: [tZ, tX, tY]

TomekTrzeciak · 2018-12-18T11:03:44Z

This logic works as long as we don't have something ridiculous like this:
Q1: [tX, tY, tZ]
Q2: [tY, tZ, tX]
Q3: [tZ, tX, tY]

Assuming each task is added to queues atomically, the relative order of tasks should be preserved under the current FIFO semantics. By if queues are ever extended to allow setting per queue priorities and reordering, such a situation is certainly possible.

ColemanTom · 2019-02-26T00:01:57Z

Just to throw another idea into the mix, and another use case as way of examples.

Say in my suite I want to limit the number of jobs sending data to external locations

  [[queues]]
    [[[SEND_DATA]]]
      limit = 20
      members = SEND_DATA

But, say also that some servers are particularly slow and struggle when more than a couple of processes are sending them data at a time, so we want to limit them.

    [[[SLOW_SERVER]]]
      limit = 2
      members = SLOW_SERVER_TASKS

The tasks sending data to the SLOW_SERVER should still be limited by the SEND_DATA queue, as they are still also sending data externally. But, they should also be limited by the SLOW_SERVER queue.

Whilst I do generally agree with @TomekTrzeciak in #2701 (comment) I do not think the use case above is covered by that discussion point. This is where a multiple limits would be useful, in my opinion.

Perhaps another way of looking at this would be for queues to be sub-queues within a parent queue? Child queues would accept jobs, and then submit them add them into the parent queue, clearing out when they clear out of the parent queue. This approach might be easier to track mentally.

    [[[SLOW_SERVER]]]
      inherit = SEND_DATA
      limit = 2
      members = SLOW_SERVER_TASKS

But in this case, if a task was by chance in both queues, it would end up in the lowest child queue, and only indirectly be in the parent queue. Task gets added to SLOW_SERVER. That then adds it up the chain into the SEND_DATA queue. When SEND_DATA actually runs it, it clears it out of SEND_DATA and then out of SLOW_SERVER.

hjoliver · 2021-02-19T04:33:15Z

(For the record, I had a working implementation of overlapping queues, with no race conditions, in #4035 - but backed out due to performance concerns plus the need to convince others that overlapping queues are a good thing 😁 ) We hope to investigate other queuing implementations again soon, however).

matthewrmshin added this to the some-day milestone Jun 22, 2018

hjoliver assigned challurip Jul 5, 2018

hjoliver modified the milestones: some-day, later Sep 6, 2018

ColemanTom unassigned challurip Feb 26, 2019

matthewrmshin modified the milestones: later, some-day Aug 28, 2019

TomekTrzeciak mentioned this issue Aug 29, 2019

Families, inheritance and GUI interaction #3335

Closed

hjoliver mentioned this issue Mar 30, 2020

Queue warning #3539

Merged

6 tasks

hjoliver mentioned this issue Jan 18, 2021

Centralize queue logic, remove queued state #4035

Closed

10 tasks

dpmatthews mentioned this issue Feb 9, 2021

Possible internal queue enhancements #4068

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Internal queues: task in multiple queues behaviour #2701

Internal queues: task in multiple queues behaviour #2701

matthewrmshin commented Jun 22, 2018

hjoliver commented Jul 5, 2018

challurip commented Sep 6, 2018

hjoliver commented Sep 6, 2018

matthewrmshin commented Sep 6, 2018

hjoliver commented Sep 6, 2018 •

edited

Loading

matthewrmshin commented Sep 10, 2018

paulcresswell commented Sep 11, 2018

hjoliver commented Sep 12, 2018 •

edited

Loading

hjoliver commented Sep 12, 2018 •

edited

Loading

hjoliver commented Nov 21, 2018

dwsutherland commented Nov 23, 2018 •

edited

Loading

paulcresswell commented Nov 23, 2018

matthewrmshin commented Dec 6, 2018

TomekTrzeciak commented Dec 10, 2018

matthewrmshin commented Dec 18, 2018

TomekTrzeciak commented Dec 18, 2018

ColemanTom commented Feb 26, 2019

hjoliver commented Feb 19, 2021

Internal queues: task in multiple queues behaviour #2701

Internal queues: task in multiple queues behaviour #2701

Comments

matthewrmshin commented Jun 22, 2018

hjoliver commented Jul 5, 2018

challurip commented Sep 6, 2018

hjoliver commented Sep 6, 2018

matthewrmshin commented Sep 6, 2018

hjoliver commented Sep 6, 2018 • edited Loading

matthewrmshin commented Sep 10, 2018

paulcresswell commented Sep 11, 2018

hjoliver commented Sep 12, 2018 • edited Loading

hjoliver commented Sep 12, 2018 • edited Loading

hjoliver commented Nov 21, 2018

dwsutherland commented Nov 23, 2018 • edited Loading

paulcresswell commented Nov 23, 2018

matthewrmshin commented Dec 6, 2018

TomekTrzeciak commented Dec 10, 2018

matthewrmshin commented Dec 18, 2018

TomekTrzeciak commented Dec 18, 2018

ColemanTom commented Feb 26, 2019

hjoliver commented Feb 19, 2021

hjoliver commented Sep 6, 2018 •

edited

Loading

hjoliver commented Sep 12, 2018 •

edited

Loading

hjoliver commented Sep 12, 2018 •

edited

Loading

dwsutherland commented Nov 23, 2018 •

edited

Loading