Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Task should fail immediately when pod is unprocessable #19359

Merged
merged 10 commits into from
Nov 4, 2021

Conversation

dstandish
Copy link
Contributor

@dstandish dstandish commented Nov 2, 2021

When pod has invalid requirements, e.g. resource limit < resource request,
the kubernetes api may return "Unprocessable Entity". In this scenario,
the kubernetes executor should fail the task immediately, rather than set
it to be attempted again

closes #19320

When pod has invalid requirements, e.g. resource limit < resource request,
the kubernetes api may return "Unprocessable Entity".  In this scenario,
the kubernetes executor should fail the task immediately, rather than set
it to be attempted again
@boring-cyborg boring-cyborg bot added provider:cncf-kubernetes Kubernetes provider related issues area:Scheduler including HA (high availability) scheduler labels Nov 2, 2021
@kaxil kaxil added this to the Airflow 2.2.2 milestone Nov 2, 2021
dstandish and others added 2 commits November 2, 2021 18:11
Co-authored-by: Ephraim Anierobi <splendidzigy24@gmail.com>
airflow/executors/kubernetes_executor.py Show resolved Hide resolved
airflow/executors/kubernetes_executor.py Show resolved Hide resolved
tests/executors/test_kubernetes_executor.py Outdated Show resolved Hide resolved
tests/executors/test_kubernetes_executor.py Show resolved Hide resolved
kubernetes_executor.execute_async(
key=('dag', 'task', 'run_id', try_number),
key=task_instance_key,
queue=None,
command=['airflow', 'tasks', 'run', 'true', 'some_parameter'],
)
kubernetes_executor.sync()
kubernetes_executor.sync()

assert mock_kube_client.create_namespaced_pod.called
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assert mock_kube_client.create_namespaced_pod.called
mock_kube_client.create_namespaced_pod.assert_called_once()

or

Suggested change
assert mock_kube_client.create_namespaced_pod.called
assert mock_kube_client.create_namespaced_pod.call_count == 1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah it's odd...

the test actually calls sync 2 times prior to this line

so depending on the scenario call count could be 2.

but i will remove one of the sync calls and assert == 1... it seems that must have been a mistake.

dstandish and others added 3 commits November 3, 2021 09:51
Co-authored-by: Jed Cunningham <66968678+jedcunningham@users.noreply.github.com>
@mock.patch('airflow.executors.kubernetes_executor.KubernetesJobWatcher')
@mock.patch('airflow.executors.kubernetes_executor.get_kube_client')
def test_run_next_exception(self, mock_get_kube_client, mock_kubernetes_job_watcher):
def test_run_next_exception_requeue(
self, mock_get_kube_client, mock_kubernetes_job_watcher, reason, status, should_requeue
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reason isn't used, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it provides a human-friendly test name. this is a pattern i've seen in other airflow tests, though perhaps more commonly the param would be called name. wdyt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e.g.

image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

though maybe this approach is better 🤷:
image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok updated

@github-actions
Copy link

github-actions bot commented Nov 3, 2021

The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest main at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.

@github-actions github-actions bot added the full tests needed We need to run full set of tests for this PR to merge label Nov 3, 2021
@kaxil kaxil merged commit eb12bb2 into apache:main Nov 4, 2021
jedcunningham pushed a commit that referenced this pull request Nov 4, 2021
When pod has invalid requirements, e.g. resource limit < resource request,
the kubernetes api may return "Unprocessable Entity".  In this scenario,
the kubernetes executor should fail the task immediately, rather than set
it to be attempted again

closes #19320

(cherry picked from commit eb12bb2)
@jedcunningham jedcunningham added the type:bug-fix Changelog: Bug Fixes label Apr 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:Scheduler including HA (high availability) scheduler full tests needed We need to run full set of tests for this PR to merge provider:cncf-kubernetes Kubernetes provider related issues type:bug-fix Changelog: Bug Fixes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Task Instance stuck inside KubernetesExecutor
5 participants