-
Notifications
You must be signed in to change notification settings - Fork 617
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possible bugs in task reaper #2672
Comments
thanks for the analysis @talex5. i'll dig into this in a bit. |
FYI I'm already working on the fixes for some of these @dperny |
Solid, I hadn't started. |
Comments/questions for the following:
Any suggestions ? @talex5
Can you please point to the code this ? @talex5
This is correct behavior. Also note that this is not the only condition. Its desired-remove && (t.Status.State < api.TaskStateAssigned || t.Status.State >= api.TaskStateCompleted)
Can you please point to the code this ? @talex5
I couldn't make sense of this either. Any ideas? @dperny
Should be fixed by #2675 @talex5
I think this is the total tasks to be kept for each slot. @talex5 cc @dperny |
(note: each bullet list item at the end of the issue refers to an XXX point in the main body)
I suggest "Clean up when we hit maxDirty ..."
The code is:
e.g. if a task is
I think you're looking at a different bit of code. It's the same as the bit I quoted above. The logic is:
For example, if a task is
Well, it's really the whole file. The code seems very confused about what is supposed to be in the dirty list. It needs to be documented. |
Agree I think it should only increase the task history, not decrease it.
It seems like if But we don't do that when starting up the task reaper from scratch - if there's a change in leadership, it might be possible we lose track of tasks to reap until another task is created for that particular slot. If that's ok, then maybe it's fine that we don't bother adding things to dirty until |
Is there some plans to continue with these? |
We've been seeing some suspicious behaviour of the task reasper (using lots of CPU) and I've been looking at the code to see what might cause that. This issue documents what I've found so far.
Current design
The design documentation for the reaper (https://github.com/docker/swarmkit/blob/master/design/orchestrators.md#task-reaper) says:
A comment in the code adds that:
What the code does
At startup, we remove:
("reapable" means terminated or not yet assigned to a node)
After the startup code has run, we start a 250ms timer running and enter the main loop. In the loop, we process events and wait for the timer to fire. When the timer fires, we stop the timer (although presumably it's already stopped) and call
tick
.XXX: A comment says
Clean up when we hit TaskHistoryRetentionLimit or when the timer expires
. I believe it meansClean up when we hit maxDirty or ...
.When we get an event in the loop:
taskHistory
setting.For any event, we arrange for
tick
to run, either by resetting the timer to 250ms, or by calling it immediately iflen(tr.dirty)+len(tr.cleanup) > maxDirty
.XXX: The code for resetting the timer is wrong, according to the Go docs. You have to stop it first (looks like there's already a PR for this: #2669).
The name
dirty
and the fact that the code callstick
on every event whendirty
is large suggests that the code expectstick
to remove all the dirty items from the list. In fact, it mostly just leaves them there.tick
deletes all tasks incleanup
and resets it to empty. It also processes all the vslots indirty
:If the vslot's service no longer exists, it removes it from the dirty list (recently fixed in [orchestrator/task reaper] Clean up tasks in dirty list for which the service has been deleted #2666 recently merged).
If
taskHistory < 0
(no cleanup) then we leave it on the dirty list.If the total number of tasks in this vslot is within the
taskHistory
limit, we leave it on the dirty list.Otherwise, we start going through all the vslot's tasks from oldest to newest, marking tasks for deletion until we're within the limit. We skip tasks that are runnable and desired-running.
XXX: What about desired-ready tasks? We probably shouldn't be deleting them.
XXX: In fact, we probably shouldn't be deleting any running task (even if it's desired to be terminated). The comment says "This check is important to ignore tasks which are running or need to be running" but the code says
t.DesiredState == api.TaskStateRunning && t.Status.State <= api.TaskStateRunning
(and).XXX: A comment says "If MaxAttempts is set, keep at least one more than that number of tasks". I would expect this logic to only increase
taskHistory
, but in fact it might lower it too. That is, if I setTaskHistoryRetentionLimit=100
and create a service withRestart.MaxAttempts=1
then I will get at most 2 retained tasks, I think.Conclusions
I find this algorithm very strange. It's O(n^2) in the number of vslots. e.g. if you create 400 services each with 4 replicas (1600 tasks), this is what happens:
Here's a graph of that happening:
This shows two runs each creating 400 services with 4 replicas and then deleting them. It's from before #2666 was merged, which is why it doesn't reset to zero when the first batch of services goes away. I added a prometheus metric for the dirty set size. You can see that CPU usage is low until we hit 1000 vslots (at the dotted line), then the CPU needed to add each task increases steadily. Although the vslots weren't removed after the first run, they don't seem to affect the CPU time much. I guess ignoring vslots with no service is quite fast.
There seems to be no need for this, since a task event can only ever lead us to delete another task in the same vslot. There's no point scanning every task in the system. Possibly
tick
was supposed to empty the dirty list after processing it, but then theUpdateTask
handler would need to re-add the vslot, and that doesn't explain the complicated logic about when to remove it.XXX: I'm a bit confused about
TaskHistoryRetentionLimit
. The docs say "The number of old tasks to keep per slot or node is controlled by Orchestration.TaskHistoryRetentionLimit", which makes sense to me, but the code seems to treat it as the total number of tasks (not the number of old tasks). This makes a difference during restarts, for example.If
TaskHistoryRetentionLimit
is the number of old tasks, then we can ignoreCreateTask
events completely and just listen for tasks becoming reapable. When a task becomes reapable, check just that vslot to see if it should push an older task out of the system.If it's the total number of tasks, then we must also monitor
CreateTask
, since adding a new runnable task may require deleting an old one.Issues noted above for further investigation:
Clean up when we hit TaskHistoryRetentionLimit
comment.Restart.MaxAttempts
reducingTaskHistoryRetentionLimit
.TaskHistoryRetentionLimit
.The text was updated successfully, but these errors were encountered: