Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EMR Serverless Fix for Jobs marked as success even on failure #26218

Conversation

syedahsn
Copy link
Contributor

@syedahsn syedahsn commented Sep 7, 2022

An issue was found where Airflow would mark a task as SUCCESS even if the EMR job failed.
The solution here is to set the desired state in EmrServerlessStartJobOperator to SUCCESS_STATES rather than TERMINAL_STATES. This makes it so that the task is marked as a failure if the job doesn't run successfully.


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@boring-cyborg boring-cyborg bot added area:providers provider:amazon-aws AWS/Amazon - related issues labels Sep 7, 2022
@@ -650,7 +650,7 @@ def execute(self, context: 'Context') -> Dict:
'jobRunId': response['jobRunId'],
},
parse_response=['jobRun', 'state'],
desired_state=EmrServerlessJobSensor.TERMINAL_STATES,
desired_state=EmrServerlessJobSensor.SUCCESS_STATES,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have test coverage for this?

also Operator imports from Sensor ? 🤯
The statuses should be defined on the Hook

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Examples:

NON_TERMINAL_STATES = {"INITIALIZED", "QUEUED", "RUNNING"}
FAILED_STATES = {"FAILED"}

INTERMEDIATE_STATES = (
"PENDING",
"SUBMITTED",
"RUNNING",
)
FAILURE_STATES = (
"FAILED",
"CANCELLED",
"CANCEL_PENDING",
)
SUCCESS_STATES = ("COMPLETED",)
TERMINAL_STATES = (
"COMPLETED",
"FAILED",
"CANCELLED",
"CANCEL_PENDING",
)

@potiuk
Copy link
Member

potiuk commented Sep 18, 2022

You need to rebase and solve conflicts @syedahsn

…ES rather than TERMINAL_STATES. This makes it so that the task is marked as a failure if the job doesn't run successfully.
Add test to cover job failure exception.
@syedahsn syedahsn force-pushed the syedahsn/fix-emr-serverless-startjob-failure-state branch from 58bd33b to 8921c49 Compare September 19, 2022 16:54
Copy link
Contributor

@eladkal eladkal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@eladkal eladkal merged commit 8f1c78f into apache:main Sep 19, 2022
@vandonr-amz vandonr-amz deleted the syedahsn/fix-emr-serverless-startjob-failure-state branch May 24, 2023 20:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:providers provider:amazon-aws AWS/Amazon - related issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants