Skip to content

Conversation

@ramonvermeulen
Copy link
Contributor

@ramonvermeulen ramonvermeulen commented Feb 19, 2025

New implementation (with extra_links) example:
image


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@boring-cyborg boring-cyborg bot added the provider:google Google (including GCP) related issues label Feb 19, 2025
@boring-cyborg
Copy link

boring-cyborg bot commented Feb 19, 2025

Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contributors' Guide (https://github.com/apache/airflow/blob/main/contributing-docs/README.rst)
Here are some useful points:

  • Pay attention to the quality of your code (ruff, mypy and type annotations). Our pre-commits will help you with that.
  • In case of a new feature add useful documentation (in docstrings or in docs/ directory). Adding a new operator? Check this short guide Consider adding an example DAG that shows how users should use it.
  • Consider using Breeze environment for testing locally, it's a heavy docker but it ships with a working Airflow and a lot of integrations.
  • Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
  • Please follow ASF Code of Conduct for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
  • Be sure to read the Airflow Coding style.
  • Always keep your Pull Requests rebased, otherwise your build might fail due to changes not related to your commits.
    Apache Airflow is a community-driven project and together we are making it better 🚀.
    In case of doubts contact the developers at:
    Mailing List: dev@airflow.apache.org
    Slack: https://s.apache.org/airflow-slack

@ramonvermeulen ramonvermeulen force-pushed the feature/ramon/36963-cloud-run-jobs-operator-log-url branch from b2d6790 to 1295e5b Compare February 19, 2025 20:22
@ramonvermeulen ramonvermeulen force-pushed the feature/ramon/36963-cloud-run-jobs-operator-log-url branch from 626d1dd to 9685f0b Compare February 24, 2025 07:46
@ramonvermeulen ramonvermeulen force-pushed the feature/ramon/36963-cloud-run-jobs-operator-log-url branch from 9685f0b to 5c07ae6 Compare February 24, 2025 07:46
@ramonvermeulen ramonvermeulen marked this pull request as ready for review February 24, 2025 07:51
@ramonvermeulen ramonvermeulen changed the title feat: draft implementation of exposing the GCP cloud logging url feat: implementation of exposing the GCP cloud logging url Feb 24, 2025
@ramonvermeulen
Copy link
Contributor Author

ramonvermeulen commented Feb 24, 2025

@potiuk @amoghrajesh @sunank200 @eladkal

Since I see that most of you are involved in decision making and recent contributions to the google-provider, and this is my first time contributing to airflow, I would love to hear your view/opinion on this feature.

… github.com:ramonvermeulen/airflow into feature/ramon/36963-cloud-run-jobs-operator-log-url
@ramonvermeulen ramonvermeulen force-pushed the feature/ramon/36963-cloud-run-jobs-operator-log-url branch from 5d17b3d to fffcc52 Compare February 25, 2025 07:04
Copy link
Member

@ashb ashb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than just putting this URL in the log message, please look at the "OperatorExtraLinks" concept -- you can make this show up as a button in the Airflow UI.

(They way I would suggest wiring that up is to have the operator push the url to XCom, and the OperatorLink can then pull the URL out of xcom)

@ramonvermeulen
Copy link
Contributor Author

Rather than just putting this URL in the log message, please look at the "OperatorExtraLinks" concept -- you can make this show up as a button in the Airflow UI.

(They way I would suggest wiring that up is to have the operator push the url to XCom, and the OperatorLink can then pull the URL out of xcom)

Thanks, wasn't aware of that functionality. Will see if I can change the implementation after the weekend to make use of OperatorExtraLinks 👍

@ramonvermeulen
Copy link
Contributor Author

Weirdly tests fail and seem to hit these airflow 3 >= lines while being ran with 2.9.3 and 2.10.5?:

https://github.com/apache/airflow/pull/46911/files#diff-27b491eef8e08f51786d7adcf1c491dbf6d6229cb5cdf91a499c8c11f73cfff1R30

@potiuk
Copy link
Member

potiuk commented Apr 25, 2025

Weirdly tests fail and seem to hit these airflow 3 >= lines while being ran with 2.9.3 and 2.10.5?:

https://github.com/apache/airflow/pull/46911/files#diff-27b491eef8e08f51786d7adcf1c491dbf6d6229cb5cdf91a499c8c11f73cfff1R30

Not weirdly - airflow.sdk does not existi in Airflow 2 and the test import it - so it fail, there needs to be compatibility code in the test to run also in Airflow 2.

Copy link
Contributor

@amoghrajesh amoghrajesh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Preemptive approval once you have fixed the failing tests

@ramonvermeulen
Copy link
Contributor Author

Weirdly tests fail and seem to hit these airflow 3 >= lines while being ran with 2.9.3 and 2.10.5?:
https://github.com/apache/airflow/pull/46911/files#diff-27b491eef8e08f51786d7adcf1c491dbf6d6229cb5cdf91a499c8c11f73cfff1R30

Not weirdly - airflow.sdk does not existi in Airflow 2 and the test import it - so it fail, there needs to be compatibility code in the test to run also in Airflow 2.

Whoops, was fixated on the other test that already had the compatibility code.

@ramonvermeulen ramonvermeulen requested a review from ashb April 30, 2025 12:19
@amoghrajesh
Copy link
Contributor

@ramonvermeulen i am going to merge this one once we get a green build.

@eladkal
Copy link
Contributor

eladkal commented May 6, 2025

Implementation to (partly) solve issue CloudRunExecuteJobOperator - logs from inside the container are not shown in Airflow #36963

Can you clarify what else would be needed to fully solve the issue after this PR is merged? Should we close the original issue and open a new one on the specific task?

@ramonvermeulen
Copy link
Contributor Author

ramonvermeulen commented May 6, 2025

Implementation to (partly) solve issue CloudRunExecuteJobOperator - logs from inside the container are not shown in Airflow #36963

Can you clarify what else would be needed to fully solve the issue after this PR is merged? Should we close the original issue and open a new one on the specific task?

What I mean with "partly" solves the issue is the following:

  • The issue is about a full log integration between the logs of the cloud run jobs and Airflow. So that you can see the full cloud run job logs directly in the Airflow UI. However, this is hard to implement and you will run into API limits on the GCP logging API quite soon (see discussion in the issue).

Therefore I created this PR, which partly solves the issue (at least gives a nice alternative). Instead of integrating the full cloud run job logs directly in Airflow, I added an extra link. With this extra link you can click on a button in the Airflow UI and get redirected to the logs in GCP of that exact cloud run job execution.

Back to your question: To fully solve the issue (after this PR) a solution needs to be implemented (if it is possible) to show the full cloud run job container logs directly in the Airflow operator.

@eladkal
Copy link
Contributor

eladkal commented May 6, 2025

Back to your question: To fully solve the issue (after this PR) a solution needs to be implemented (if it is possible) to show the full cloud run job container logs directly in the Airflow operator.

So to allow dump of the remote job logs into the Airflow task like we do with AWS Glue?

:param verbose: If True, Glue Job Run logs show in the Airflow Task Logs. (default: False)

@ramonvermeulen
Copy link
Contributor Author

ramonvermeulen commented May 6, 2025

Back to your question: To fully solve the issue (after this PR) a solution needs to be implemented (if it is possible) to show the full cloud run job container logs directly in the Airflow operator.

So to allow dump of the remote job logs into the Airflow task like we do with AWS Glue?

:param verbose: If True, Glue Job Run logs show in the Airflow Task Logs. (default: False)

Yes, something similar where the logs get fetched after completion via the logging API (or polled during job execution, but I think you will run into API limits even quicker). In the issue there is already an example provided, with a side note about the API limits.

@eladkal
Copy link
Contributor

eladkal commented May 8, 2025

Yes, something similar where the logs get fetched after completion via the logging API (or polled during job execution, but I think you will run into API limits even quicker). In the issue there is already an example provided, with a side note about the API limits.

in some of AWS operators that store logs in cloud watch there is the option to set fetch interval

:param awslogs_fetch_interval: the interval that the ECS task log fetcher should wait
in between each Cloudwatch logs fetches.

but lets not discuss this here. The reason I mentioned this is to ask if you can comment on the issue and explain what is left to implement after we merge this PR so it will be clear to the developer who takes the task

@eladkal
Copy link
Contributor

eladkal commented May 8, 2025

@ramonvermeulen can you rebase and resolve conflicts?

@ramonvermeulen
Copy link
Contributor Author

Yes, something similar where the logs get fetched after completion via the logging API (or polled during job execution, but I think you will run into API limits even quicker). In the issue there is already an example provided, with a side note about the API limits.

in some of AWS operators that store logs in cloud watch there is the option to set fetch interval

:param awslogs_fetch_interval: the interval that the ECS task log fetcher should wait
in between each Cloudwatch logs fetches.

but lets not discuss this here. The reason I mentioned this is to ask if you can comment on the issue and explain what is left to implement after we merge this PR so it will be clear to the developer who takes the task

Sure, that makes sense! I will give an update in the issue after the PR is merged 👍

@eladkal eladkal merged commit 85013eb into apache:main May 8, 2025
66 of 67 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

provider:google Google (including GCP) related issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants