Optimize deferrable execution mode for `DbtCloudRunJobOperator` #31188

phanikumv · 2023-05-10T18:16:14Z

In deferrable mode for DbtCloudRunJobOperator, we should first check if job is in terminal state or not in the execute method and only defer if that is not in terminal state. This way we don’t run an unnecessary deferral cycle if the condition is already true.

^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

pankajkoti · 2023-05-15T12:40:04Z

airflow/providers/dbt/cloud/operators/dbt.py

+                    self.log.info("Job run %s has completed successfully.", str(self.run_id))
+                    return self.run_id
+                elif job_run_status in (
+                    DbtCloudJobRunStatus.CANCELLED.value,


How do we want to handle the cancelled state? If the user has manually cancelled the job and does not want further processing, we should not retry such cancelled jobs. By default, Airflow will retry such tasks. We could raise AirflowFailException in such cases so that Airflow does not retry those tasks.

Great question. I think this could be handled in a separate PR though.

I could see hard-failing the task on user cancellation being unexpected or expected. Perhaps this could be a new parameter to control how cancelled runs are handled?

sunank200 · 2023-05-15T13:33:08Z

airflow/providers/dbt/cloud/operators/dbt.py

-                    method_name="execute_complete",
-                )
+                job_run_info = JobRunInfo(account_id=self.account_id, run_id=self.run_id)
+                job_run_status = self.hook.get_job_run_status(**job_run_info)


should we log the state here?

This hook method does have logging lines of

self.log.info("Getting the status of job run %s.", str(run_id))

and

self.log.info( "Current status of job run %s: %s", str(run_id), DbtCloudJobRunStatus(job_run_status).name )

which should handle the state logging.

josh-fell · 2023-05-15T18:11:39Z

airflow/providers/dbt/cloud/operators/dbt.py

-                    method_name="execute_complete",
-                )
+                job_run_info = JobRunInfo(account_id=self.account_id, run_id=self.run_id)
+                job_run_status = self.hook.get_job_run_status(**job_run_info)


This hook method does have logging lines of

self.log.info("Getting the status of job run %s.", str(run_id))

and

self.log.info( "Current status of job run %s: %s", str(run_id), DbtCloudJobRunStatus(job_run_status).name )

which should handle the state logging.

josh-fell · 2023-05-15T18:13:19Z

airflow/providers/dbt/cloud/operators/dbt.py

+                    self.log.info("Job run %s has completed successfully.", str(self.run_id))
+                    return self.run_id
+                elif job_run_status in (
+                    DbtCloudJobRunStatus.CANCELLED.value,


Great question. I think this could be handled in a separate PR though.

I could see hard-failing the task on user cancellation being unexpected or expected. Perhaps this could be a new parameter to control how cancelled runs are handled?

phanikumv requested a review from josh-fell as a code owner May 10, 2023 18:16

boring-cyborg bot added the area:providers label May 10, 2023

phanikumv force-pushed the optimize_dbt branch from e9ee2c7 to 0d7db20 Compare May 11, 2023 02:33

pankajastro approved these changes May 11, 2023

View reviewed changes

phanikumv force-pushed the optimize_dbt branch from 0d7db20 to ab6e5ab Compare May 14, 2023 12:26

Optimize deferred execution mode

112649f

phanikumv force-pushed the optimize_dbt branch from ab6e5ab to 112649f Compare May 15, 2023 10:21

pankajkoti reviewed May 15, 2023

View reviewed changes

sunank200 approved these changes May 15, 2023

View reviewed changes

josh-fell approved these changes May 15, 2023

View reviewed changes

josh-fell merged commit bb4a0b3 into apache:main May 15, 2023

eladkal mentioned this pull request May 19, 2023

Status of testing Providers that were prepared on May 19, 2023 #31322

Closed

80 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize deferrable execution mode for `DbtCloudRunJobOperator` #31188

Optimize deferrable execution mode for `DbtCloudRunJobOperator` #31188

phanikumv commented May 10, 2023 •

edited

Loading

pankajkoti May 15, 2023

josh-fell May 15, 2023

sunank200 May 15, 2023

josh-fell May 15, 2023

josh-fell May 15, 2023

josh-fell May 15, 2023

Optimize deferrable execution mode for DbtCloudRunJobOperator #31188

Optimize deferrable execution mode for DbtCloudRunJobOperator #31188

Conversation

phanikumv commented May 10, 2023 • edited Loading

pankajkoti May 15, 2023

Choose a reason for hiding this comment

josh-fell May 15, 2023

Choose a reason for hiding this comment

sunank200 May 15, 2023

Choose a reason for hiding this comment

josh-fell May 15, 2023

Choose a reason for hiding this comment

josh-fell May 15, 2023

Choose a reason for hiding this comment

josh-fell May 15, 2023

Choose a reason for hiding this comment

Optimize deferrable execution mode for `DbtCloudRunJobOperator` #31188

Optimize deferrable execution mode for `DbtCloudRunJobOperator` #31188

phanikumv commented May 10, 2023 •

edited

Loading