Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataflowStartFlexTemplateOperator. Check for Dataflow job type each check cycle. #40584

Merged
merged 3 commits into from
Jul 4, 2024

Conversation

theopus
Copy link
Contributor

@theopus theopus commented Jul 3, 2024

With expected status is not set, _DataflowJobsController during first check sets terminal state for job like if it was BATCH.

current_state = job["currentState"]
is_streaming = job.get("type") == DataflowJobType.JOB_TYPE_STREAMING
if self._expected_terminal_state is None:
if is_streaming:
self._expected_terminal_state = DataflowJobStatus.JOB_STATE_RUNNING
else:
self._expected_terminal_state = DataflowJobStatus.JOB_STATE_DONE

self._expected_terminal_state = JOB_STATE_DONE
But it is common practice to use FlexTemplate for Datatflow job deployment, in that case job type could be set to STREAMING during job startup.

After streaming job is successfully started its state changed to JOB_STATE_RUNNING and flag is_streaming equals to True:
following branch is executed:

elif is_streaming and self._expected_terminal_state == DataflowJobStatus.JOB_STATE_DONE:
raise AirflowException(
"Google Cloud Dataflow job's expected terminal state cannot be "
"JOB_STATE_DONE while it is a streaming job"
)

which terminates operator with error because streaming is expected to have expected_status RUNNING and not DONE.

After this change expected_state inferred each cycle using up-to-date job type.


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@boring-cyborg boring-cyborg bot added area:providers provider:google Google (including GCP) related issues labels Jul 3, 2024
Copy link

boring-cyborg bot commented Jul 3, 2024

Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contributors' Guide (https://github.com/apache/airflow/blob/main/contributing-docs/README.rst)
Here are some useful points:

  • Pay attention to the quality of your code (ruff, mypy and type annotations). Our pre-commits will help you with that.
  • In case of a new feature add useful documentation (in docstrings or in docs/ directory). Adding a new operator? Check this short guide Consider adding an example DAG that shows how users should use it.
  • Consider using Breeze environment for testing locally, it's a heavy docker but it ships with a working Airflow and a lot of integrations.
  • Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
  • Please follow ASF Code of Conduct for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
  • Be sure to read the Airflow Coding style.
  • Always keep your Pull Requests rebased, otherwise your build might fail due to changes not related to your commits.
    Apache Airflow is a community-driven project and together we are making it better 🚀.
    In case of doubts contact the developers at:
    Mailing List: dev@airflow.apache.org
    Slack: https://s.apache.org/airflow-slack

@theopus theopus force-pushed the dataflow-hook-handle-job-state-change branch from 3b4f714 to ca4f8f7 Compare July 3, 2024 12:05
@theopus
Copy link
Contributor Author

theopus commented Jul 3, 2024

Hello, @shahar1, @potiuk. Could you please take a look. Thank you

@potiuk
Copy link
Member

potiuk commented Jul 3, 2024

Hello, @shahar1, @potiuk. Could you please take a look. Thank you

Right after statuc checks are fixed (strongly recommend pre-commit installation)

@moiseenkov
Copy link
Contributor

Thank you for the contribution! Is this new behavior the same in deferrable mode?

@theopus
Copy link
Contributor Author

theopus commented Jul 3, 2024

Thank you for the contribution! Is this new behavior the same in deferrable mode?

@moiseenkov deferrable is a bit different, since only DONE/FAILED/STOPPED are terminal states and RUNNING isn't.

def _validate_deferrable_params(self):
if self.deferrable and self.wait_until_finished:
raise ValueError(
"Conflict between deferrable and wait_until_finished parameters "
"because it makes operator as blocking when it requires to be deferred. "
"It should be True as deferrable parameter or True as wait_until_finished."
)
if self.deferrable and self.wait_until_finished is None:
self.wait_until_finished = False

try:
while True:
status = await hook.get_job_status(
project_id=self.project_id,
job_id=self.job_id,
location=self.location,
)
if status == JobState.JOB_STATE_DONE:
yield TriggerEvent(
{
"job_id": self.job_id,
"status": "success",
"message": "Job completed",
}
)
return
elif status == JobState.JOB_STATE_FAILED:
yield TriggerEvent(
{
"status": "error",
"message": f"Dataflow job with id {self.job_id} has failed its execution",
}
)
return
elif status == JobState.JOB_STATE_STOPPED:
yield TriggerEvent(
{
"status": "stopped",
"message": f"Dataflow job with id {self.job_id} was stopped",
}
)
return
else:
self.log.info("Job is still running...")
self.log.info("Current job status is: %s", status.name)
self.log.info("Sleeping for %s seconds.", self.poll_sleep)
await asyncio.sleep(self.poll_sleep)
except Exception as e:
self.log.exception("Exception occurred while checking for job completion.")
yield TriggerEvent({"status": "error", "message": str(e)})

should change behavior & propagate wait_until_finished & expected_terminal_state to trigger?

@potiuk potiuk merged commit 93488d0 into apache:main Jul 4, 2024
51 checks passed
Copy link

boring-cyborg bot commented Jul 4, 2024

Awesome work, congrats on your first merged pull request! You are invited to check our Issue Tracker for additional contributions.

romsharon98 pushed a commit to romsharon98/airflow that referenced this pull request Jul 26, 2024
…heck cycle. (apache#40584)



---------

Co-authored-by: Oleksandr Tkachov <oleksandr.tkachov@medecision.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:providers provider:google Google (including GCP) related issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants