Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bugfix/repair-databricks-plugin #40708

Closed

Conversation

jscheffl
Copy link
Contributor

I found some serious problems with PR #40153 - actually I would rather recommend to revert this. It is really bad code added.

First I was wondering about the new menu item. Actually wanted to optimize this and make the new Databricks Plugin to the "Admin" menu as a new top level menu entry is not really the right place.

But after changing a single line I was horrified about the plugin code and quality. It is just broken by default. If you click on the menu, you get an error page. Because the URL expects parameters and fails w/o.
So either there should not be a menu entry be added and plus parameters need to be checked.

This PR fixes the most critical problems:

  • Moving the plugin into "Admin" menu to be less prominent - whereas questioning a menu entry in general. It is use-less
  • Fixed handling if required parameters are missing
  • Fixing failure if run_id is not passed, replace will fail
  • Fixing redirect if no dag_id in parameters

But more and really serious is that the new plugin menu adds a Flask endpoint w/o any authentication! You can call the new URL anonymously(!) and pass and parameters that is affectting tasks, clears backends. Security nightmare,
I am not sure what the right access level must be (and asusme there was no concept about this... who is authorized to clear databricks tasks? new role needed? admin? owner of dags? who)?
Added "some" authentication to have a basic access check but I assume this needs re-thinking

@jscheffl
Copy link
Contributor Author

I checked my PR, there are 3 problems:

  • Pytests need to be DB aware because of auth checks in the fix
  • Auth must be made backwards compatible as FAB provider was different in older versions
  • Pytest checking repair must be able to correctly get parameters.

In light of efiirts I would rather recommend to revers #40153 and apply a proper fix before putting on main again

@pankajkoti
Copy link
Member

pankajkoti commented Jul 11, 2024

Hi @jscheffl, thanks for highlighting the concerns.

Thanks also for trying to fix the flaws. I believe we won't need the menu item as it is useless like you said. I will work on addressing the concerns and create a new PR. Perhaps we could close this PR now as the revert PR is merged? I will like to reach out to you @jscheffl if I need some help/advice and would be nice if you'd have some time and would be really nice if we could join hands in fixing this and having a proper implementation in place.

@pankajkoti
Copy link
Member

pankajkoti commented Jul 11, 2024

And for the auth access, I guess someone who can trigger a DAG run would/(should) be able to repair tasks. So I guess having the same access check would be okay, no?

@auth.has_access_dag("POST", DagAccessEntity.RUN)

@auth.has_access_dag("POST", DagAccessEntity.RUN)

@potiuk
Copy link
Member

potiuk commented Jul 11, 2024

And for the auth access, I guess someone who can trigger a DAG run would/(should) be able to repair tasks. So I guess having the same access check would be okay, no?

Looks good.

BTW. A little tangential and likely separate PR, but I had a thought @jscheffl @pankajkoti - after that one, that we should learn from it :)

Maybe we could figure out a test or modify a framework of ours that will detect such cases automatically. IMHO the right approach for endpoints is that they should all be protected with the very little number of exceptions when they are not (and they should be very explicitly listed as exceptions) - so maybe we could figure out either a test or some mechanism in handling the endpoints so that if someone adds a new endpoint without authentication - it will not work by default (unless it is added to a list of exceptions). WDYT?

@potiuk
Copy link
Member

potiuk commented Jul 11, 2024

(This is called "secure by design" approach BTW).

@pankajkoti
Copy link
Member

I have created a new PR #40724 to address the implementation concerns. Would be nice if could seek early reviews and can address further concerns if any.

@jscheffl
Copy link
Contributor Author

And for the auth access, I guess someone who can trigger a DAG run would/(should) be able to repair tasks. So I guess having the same access check would be okay, no?

Looks good.

BTW. A little tangential and likely separate PR, but I had a thought @jscheffl @pankajkoti - after that one, that we should learn from it :)

Maybe we could figure out a test or modify a framework of ours that will detect such cases automatically. IMHO the right approach for endpoints is that they should all be protected with the very little number of exceptions when they are not (and they should be very explicitly listed as exceptions) - so maybe we could figure out either a test or some mechanism in handling the endpoints so that if someone adds a new endpoint without authentication - it will not work by default (unless it is added to a list of exceptions). WDYT?

Yeah, also thought of something like: PluginManager checks if all @exposeed endpoints carry an auth decorator or reject loading. And adding one decorator to explicit allow anonymous access for the exceptions only. But need to find tme to make a PR across all other duties :-D

@jscheffl jscheffl closed this Jul 11, 2024
@jscheffl
Copy link
Contributor Author

And for the auth access, I guess someone who can trigger a DAG run would/(should) be able to repair tasks. So I guess having the same access check would be okay, no?

Looks good.
BTW. A little tangential and likely separate PR, but I had a thought @jscheffl @pankajkoti - after that one, that we should learn from it :)
Maybe we could figure out a test or modify a framework of ours that will detect such cases automatically. IMHO the right approach for endpoints is that they should all be protected with the very little number of exceptions when they are not (and they should be very explicitly listed as exceptions) - so maybe we could figure out either a test or some mechanism in handling the endpoints so that if someone adds a new endpoint without authentication - it will not work by default (unless it is added to a list of exceptions). WDYT?

Yeah, also thought of something like: PluginManager checks if all @exposeed endpoints carry an auth decorator or reject loading. And adding one decorator to explicit allow anonymous access for the exceptions only. But need to find tme to make a PR across all other duties :-D

@potiuk I checked the option, programmatically I found no way in Python to inspect for decorators on python methods - to use an approach with reflection to check for all @expose declarations if an @auth...* decorator is added as well.

The only viable approach I could find/see is to use inspect.getsourcelines and parsing the code manually... would you think this is a viable method?

@potiuk
Copy link
Member

potiuk commented Jul 12, 2024

The only viable approach I could find/see is to use inspect.getsourcelines and parsing the code manually... would you think this is a viable method?

How about runtime check ? Smth like (of course would need more caching and thought):

allowed_enpoints=[
   "login",
  "....",
]

@app.before_request
def before_request():
    if not request.authorized and request.base_url not in [url_for(endpoint) for endpoint in allowed_endpoints]:
        return redirect(url_for("login"))

pankajkoti added a commit that referenced this pull request Jul 12, 2024
The DatabricksWorkflowPlugin provides with links in the Airflow
UI for tasks that allow us to see the Databricks job run in the
Databricks workspace, additionally it also provides link to 
repair task(s) in the workflow.

Databricks does not allow repairing jobs with single tasks launched
outside the workflow, hence we just provide the link for the job run.

Within the workflow, for each of the task, we provide links to the
job run and repair link for the single task

And at the workflow level, for the job launch task, we provide a
link to repair all failed tasks along with the link for job run in 
the Databricks workspace that can be used to monitor the job
in the Databricks account.


This PR is the second attempt on adding the DatabricksWorkflowPlugin,
the previous attempt being #40153. However, there were some concerns 
raised in #40708 and hence it was reverted in #40714. This newer PR 
attempts to address those concerns.
romsharon98 pushed a commit to romsharon98/airflow that referenced this pull request Jul 26, 2024
The DatabricksWorkflowPlugin provides with links in the Airflow
UI for tasks that allow us to see the Databricks job run in the
Databricks workspace, additionally it also provides link to 
repair task(s) in the workflow.

Databricks does not allow repairing jobs with single tasks launched
outside the workflow, hence we just provide the link for the job run.

Within the workflow, for each of the task, we provide links to the
job run and repair link for the single task

And at the workflow level, for the job launch task, we provide a
link to repair all failed tasks along with the link for job run in 
the Databricks workspace that can be used to monitor the job
in the Databricks account.


This PR is the second attempt on adding the DatabricksWorkflowPlugin,
the previous attempt being apache#40153. However, there were some concerns 
raised in apache#40708 and hence it was reverted in apache#40714. This newer PR 
attempts to address those concerns.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants