Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Applier hangs if schedule request buffer is full #926

Closed
nightkr opened this issue Jun 3, 2022 · 0 comments · Fixed by #932
Closed

Applier hangs if schedule request buffer is full #926

nightkr opened this issue Jun 3, 2022 · 0 comments · Fixed by #932
Assignees
Labels
bug Something isn't working runtime controller runtime related

Comments

@nightkr
Copy link
Member

nightkr commented Jun 3, 2022

Current and expected behavior

See #925 for more details.

The applier currently stops trying to process new scheduling requests while trying to write back the result of a reconciliation. This leads to a deadlock where the buffer is never emptied because we're busy trying to fill it.

Thanks to @moustafab for reporting the issue and contributing a workaround.

Possible solution

  1. Explicitly try to empty the buffer during writeback
  2. Move the scheduler into a separate Tokio task
  3. Remove the size limit on the schedule request buffer (current workaround taken in fix applier hangs which can happen with many watched objects #925)

Additional context

No response

Environment

This is a runtime issue, independent of K8s version

Configuration and features

kube 0.73.1

Affected crates

kube-runtime

Would you like to work on fixing this bug?

yes

@nightkr nightkr added the bug Something isn't working label Jun 3, 2022
@nightkr nightkr self-assigned this Jun 3, 2022
@nightkr nightkr added the runtime controller runtime related label Jun 3, 2022
nightkr added a commit to nightkr/kube-rs that referenced this issue Jun 8, 2022
nightkr added a commit to nightkr/kube-rs that referenced this issue Jun 8, 2022
This fixes kube-rs#926, since we already run multiple reconcilers in parallel.
nightkr added a commit to nightkr/kube-rs that referenced this issue Jun 8, 2022
nightkr added a commit to nightkr/kube-rs that referenced this issue Jun 8, 2022
Signed-off-by: Teo Klestrup Röijezon <teo@nullable.se>
nightkr added a commit to nightkr/kube-rs that referenced this issue Jun 8, 2022
This fixes kube-rs#926, since we already run multiple reconcilers in parallel.

Signed-off-by: Teo Klestrup Röijezon <teo@nullable.se>
nightkr added a commit to nightkr/kube-rs that referenced this issue Jun 8, 2022
Signed-off-by: Teo Klestrup Röijezon <teo@nullable.se>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working runtime controller runtime related
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant