Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add DatabricksWorkflowPlugin #40724

Merged
merged 4 commits into from
Jul 12, 2024
Merged

Conversation

pankajkoti
Copy link
Member

@pankajkoti pankajkoti commented Jul 11, 2024

The DatabricksWorkflowPlugin provides with links in the Airflow
UI for tasks that allow us to see the Databricks job run in the
Databricks workspace, additionally it also provides link to
repair task(s) in the workflow.

Databricks does not allow repairing jobs with single tasks launched
outside the workflow, hence we just provide the link for the job run.

Within the workflow, for each of the task, we provide links to the
job run and repair link for the single task

And at the workflow level, for the job launch task, we provide a
link to repair all failed tasks along with the link for job run in
the Databricks workspace that can be used to monitor the job
in the Databricks account.

This PR is the second attempt on adding the DatabricksWorkflowPlugin,
the previous attempt being #40153. However, there were some concerns
raised in #40708 and hence it was reverted in #40714. This newer PR
attempts to address those concerns.


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@pankajkoti
Copy link
Member Author

@vincbeck would you have an idea on how I could resolve the CI test failures here.

They complain with

FAILED tests/providers/amazon/aws/auth_manager/test_aws_auth_manager.py::TestAwsAuthManager::test_aws_auth_manager_index - werkzeug.routing.exceptions.BuildError: Could not build url for endpoint 'RepairDatabricksTasks.repair'. Did you forget to specify values ['dag_id', 'run_id']?

@pankajkoti pankajkoti requested a review from vincbeck July 11, 2024 14:15
@vincbeck
Copy link
Contributor

From the logs, all I can see is when Airflow tries to load the index page (DAGs list), it is failing because of:

werkzeug.routing.exceptions.BuildError: Could not build url for endpoint 'RepairDatabricksTasks.repair'. Did you forget to specify values ['dag_id', 'run_id']?

From the stacktrace I can see that it seems your plugin add a new element in the nav bar pointing to the view RepairDatabricksTasks.repair. This view has some mandatory parameter (dag_id and run_id) and that's why it is failing. When trying to build the nav bar and add a link towards this view, Airflow cannot because it requires some parameters.

I am surprised it works when you run Airflow ... Need further investigation

Copy link
Contributor

@jscheffl jscheffl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just reviewed the plugin, rest I asusm eis unchanged from previous merged PR.

Some questions on the plugin, I think a lot of validation coul dbe removed with a URL pattern extension.

Otherwise... now looking secure and menu entry is gone. Thanks.

@pankajkoti pankajkoti marked this pull request as ready for review July 11, 2024 20:51
Copy link
Contributor

@jscheffl jscheffl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the efforts in making the plugin more secure.
The errors in build seem to be unrelated - but would be good to double check before merge.

@pankajkoti
Copy link
Member Author

pankajkoti commented Jul 12, 2024

Thanks for the efforts in making the plugin more secure. The errors in build seem to be unrelated - but would be good to double check before merge.

Thanks a lot to you for raising the concerns in the first place, we're now in much better shape :) Yes, the failures are unrelated, I will go ahead with the merge

Thanks to @vincbeck for checking the CI failures during the journey on the PR :)

@pankajkoti pankajkoti merged commit fded2d8 into apache:main Jul 12, 2024
102 of 108 checks passed
@pankajkoti pankajkoti deleted the add-databricks-plugin branch July 12, 2024 20:09
pankajkoti added a commit to astronomer/airflow that referenced this pull request Jul 18, 2024
@ephraimbuddy ephraimbuddy added the changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..) label Jul 23, 2024
@ephraimbuddy ephraimbuddy added this to the Airflow 2.10.0 milestone Jul 23, 2024
romsharon98 pushed a commit to romsharon98/airflow that referenced this pull request Jul 26, 2024
The DatabricksWorkflowPlugin provides with links in the Airflow
UI for tasks that allow us to see the Databricks job run in the
Databricks workspace, additionally it also provides link to 
repair task(s) in the workflow.

Databricks does not allow repairing jobs with single tasks launched
outside the workflow, hence we just provide the link for the job run.

Within the workflow, for each of the task, we provide links to the
job run and repair link for the single task

And at the workflow level, for the job launch task, we provide a
link to repair all failed tasks along with the link for job run in 
the Databricks workspace that can be used to monitor the job
in the Databricks account.


This PR is the second attempt on adding the DatabricksWorkflowPlugin,
the previous attempt being apache#40153. However, there were some concerns 
raised in apache#40708 and hence it was reverted in apache#40714. This newer PR 
attempts to address those concerns.
romsharon98 pushed a commit to romsharon98/airflow that referenced this pull request Jul 26, 2024
pankajkoti added a commit to astronomer/astro-provider-databricks that referenced this pull request Aug 8, 2024
As part of Astronomer's internal plans and decisions, we've decided to contribute the existing functionality provided by the operators and plugins in this repository to the official Apache Airflow Databricks provider. To achieve this, we submitted the following PRs to the Airflow provider:

1. apache/airflow#39178
2. apache/airflow#39771
3. apache/airflow#40013
4. apache/airflow#40724
5. apache/airflow#39295

All functionality has now been contributed to the Airflow Databricks provider, and ongoing support will be maintained there. As a result, we're deprecating the operators and plugins in this repository. Users are encouraged to transition to the official Apache Airflow Databricks provider as soon as possible. The migration process is straightforward—simply update the import path to point to the Airflow provider and ensure that you install `apache-airflow-providers-databricks>=6.8.0`, which includes all the contributions mentioned above.

closes: astronomer/issues-airflow#715
kosteev pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request Sep 16, 2024
Fix unit tests:
  - test_does_not_double_import_entrypoint_provider_plugins - in apache-airflow-providers-databricks==6.8.0 was added DatabricksWorkflowPlugin (apache/airflow#40724)
  - test_dataset - in apache-airflow-providers-amazon==8.27.0 changed
    Dataset URI format validation (apache/airflow#40173)

Change-Id: Iae902e544aae2086ea4495b0850c19f813aa7069
GitOrigin-RevId: 7d5b7a9ead32610f7e3864230e55bb3a17bf6da5
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:plugins area:providers changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..) kind:documentation provider:databricks
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants