Skip to content
This repository has been archived by the owner on Oct 9, 2023. It is now read-only.

Fast fail if task resource requests exceed k8s resource limits #488

Merged
merged 9 commits into from
May 5, 2023

Conversation

hamersaw
Copy link
Contributor

@hamersaw hamersaw commented Sep 30, 2022

TL;DR

When encountering a "ResourceExceedsLimits" error from k8s we validate that the task resource requests and limits are below the k8s resource quota. Otherwise the task will never be schedule-able and will forever hang until FlytePropeller terminates it based on node-active-duration.

Type

  • Bug Fix
  • Feature
  • Plugin

Are all requirements met?

  • Code completed
  • Smoke tested
  • Unit tests added
  • Code documentation added
  • Any pending items have an associated Issue

Complete description

^^^

Tracking Issue

fixes flyteorg/flyte#2933

Follow-up issue

NA

Signed-off-by: Daniel Rammer <daniel@union.ai>
Signed-off-by: Daniel Rammer <daniel@union.ai>
Signed-off-by: Daniel Rammer <daniel@union.ai>
Signed-off-by: Daniel Rammer <daniel@union.ai>
Signed-off-by: Daniel Rammer <daniel@union.ai>
@codecov
Copy link

codecov bot commented Sep 30, 2022

Codecov Report

Merging #488 (08b6349) into master (5b50d88) will increase coverage by 0.38%.
The diff coverage is 53.57%.

❗ Current head 08b6349 differs from pull request most recent head d4326e5. Consider uploading reports for the commit d4326e5 to get more accurate results

Signed-off-by: Daniel Rammer <daniel@union.ai>
@hamersaw hamersaw marked this pull request as ready for review September 30, 2022 20:10
@flixr
Copy link
Contributor

flixr commented Jan 10, 2023

This would be helpful! Anything missing here?

Signed-off-by: Daniel Rammer <daniel@union.ai>
Signed-off-by: Daniel Rammer <daniel@union.ai>
Signed-off-by: Daniel Rammer <daniel@union.ai>
@hamersaw hamersaw merged commit f4cadb0 into master May 5, 2023
@hamersaw hamersaw deleted the feature/fast-fail-on-k8s-resource-limits branch May 5, 2023 21:40
eapolinario pushed a commit to eapolinario/flytepropeller that referenced this pull request Aug 9, 2023
…org#488)

* checking if task resource requests exceed k8s limits

Signed-off-by: Daniel Rammer <daniel@union.ai>

* added better message to task failure

Signed-off-by: Daniel Rammer <daniel@union.ai>

* added request checks

Signed-off-by: Daniel Rammer <daniel@union.ai>

* added tests for checking resource eligibility

Signed-off-by: Daniel Rammer <daniel@union.ai>

* fixed lint issues

Signed-off-by: Daniel Rammer <daniel@union.ai>

* updated comment

Signed-off-by: Daniel Rammer <daniel@union.ai>

---------

Signed-off-by: Daniel Rammer <daniel@union.ai>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Core feature] Fast fail tasks if resource quota request exceeds k8s quota limits
3 participants