Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Azure OAuth CSRF State Not Equal Error #518

Closed
ahipp13 opened this issue Dec 19, 2022 · 16 comments
Closed

Azure OAuth CSRF State Not Equal Error #518

ahipp13 opened this issue Dec 19, 2022 · 16 comments
Assignees
Labels

Comments

@ahipp13
Copy link

ahipp13 commented Dec 19, 2022

Describe the bug

When I try to use Azure OAuth to log into my Airflow application, instead of logging me in I am sent back to the login page with a CSRF state mismatch error.

Error Stacks

[2022-11-28 22:04:58,744] {views.py:659} ERROR - Error authorizing OAuth access token: mismatching_state: CSRF Warning! State not equal in request and response.
airflow-web [2022-11-28 22:04:58,744] {views.py:659} ERROR - Error authorizing OAuth access token: mismatching_state: CSRF Warning! State not equal in request and response.

To Reproduce

I am running Airflow 2.4.3 on Kubernetes pods using the airflow community helm chart 8.6.1(https://github.com/airflow-helm/charts) and python 3.8.15. To use Azure OAuth configuration with Airflow, I have created the webserver_config file to do so:

from flask_appbuilder.security.manager import AUTH_OAUTH
from airflow.www.security import AirflowSecurityManager
import logging
from typing import Dict, Any, List, Union
import os
import sys

#Add this as a module to pythons path
sys.path.append('/opt/airflow')

log = logging.getLogger(__name__)
log.setLevel(os.getenv("AIRFLOW__LOGGING__FAB_LOGGING_LEVEL", "DEBUG"))

class AzureCustomSecurity(AirflowSecurityManager):
    # In this example, the oauth provider == 'azure'.
    # If you ever want to support other providers, see how it is done here:
    # https://github.com/dpgaspar/Flask-AppBuilder/blob/master/flask_appbuilder/security/manager.py#L550
    def get_oauth_user_info(self, provider, resp):
        # Creates the user info payload from Azure.
        # The user previously allowed your app to act on their behalf,
        #   so now we can query the user and teams endpoints for their data.
        # Username and team membership are added to the payload and returned to FAB.
        if provider == "azure":
            log.debug("Azure response received : {0}".format(resp))
            id_token = resp["id_token"]
            log.debug(str(id_token))
            me = self._azure_jwt_token_parse(id_token)
            log.debug("Parse JWT token : {0}".format(me))
            return {
                "name": me.get("name", ""),
                "email": me["upn"],
                "first_name": me.get("given_name", ""),
                "last_name": me.get("family_name", ""),
                "id": me["oid"],
                "username": me["oid"],
                "role_keys": me.get("roles", []),
            }

# Adding this in because if not the redirect url will start with http and we want https
os.environ["AIRFLOW__WEBSERVER__ENABLE_PROXY_FIX"] = "True"
WTF_CSRF_ENABLED = False
CSRF_ENABLED = False
AUTH_TYPE = AUTH_OAUTH
AUTH_ROLES_SYNC_AT_LOGIN = True  # Checks roles on every login
# Make sure to replace this with the path to your security manager class
FAB_SECURITY_MANAGER_CLASS = "webserver_config.AzureCustomSecurity"
# a mapping from the values of `userinfo["role_keys"]` to a list of FAB roles
AUTH_ROLES_MAPPING = {
    "airflow_dev_admin": ["Admin"],
    "airflow_dev_op": ["Op"],
    "airflow_dev_user": ["User"],
    "airflow_dev_viewer": ["Viewer"]
    }
# force users to re-auth after 30min of inactivity (to keep roles in sync)
PERMANENT_SESSION_LIFETIME = 1800
# If you wish, you can add multiple OAuth providers.
OAUTH_PROVIDERS = [
    {
        "name": "azure",
        "icon": "fa-windows",
        "token_key": "access_token",
        "remote_app": {
            "client_id": "CLIENT_ID",
            "client_secret": 'AZURE_DEV_CLIENT_SECRET',
            "api_base_url": "https://login.microsoftonline.com/TENANT_ID",
            "request_token_url": None,
            'request_token_params': {
                'scope': 'openid email profile'
            },
            "access_token_url": "https://login.microsoftonline.com/TENANT_ID/oauth2/v2.0/token",
            "access_token_params": {
                'scope': 'openid email profile'
            },
            "authorize_url": "https://login.microsoftonline.com/TENANT_ID/oauth2/v2.0/authorize",
            "authorize_params": {
                'scope': 'openid email profile',
            },
            'jwks_uri':'https://login.microsoftonline.com/common/discovery/v2.0/keys',
        },
    },
]

Expected behavior

When you hit the login button it should log me in to the airflow instance.

Environment:

  • OS: Debian GNU/Linux 11 (bullseye)
  • Python Version: 3.8.15
  • Authlib Version: 1.1.0

Additional context

This was working just fine, but when we upgraded from airflow 2.2.5 to 2.4.3 this issue arose. Here are some current versions of libraries I have installed:

Airflow==2.4.3
Authlib==1.1.0
Flask-AppBuilder==4.1.4
Flask-Babel==2.0.0
Flask-Caching==2.0.1
Flask-JWT-Extended==4.4.4
Flask-Login==0.6.2
Flask-SQLAlchemy==2.5.1
Flask-Session==0.4.0
Flask-WTF==1.0.1
Flask==2.2.2

I have posted this issue in airflow as well as FAB and still have not been able to find out how to solve

@ahipp13
Copy link
Author

ahipp13 commented Dec 20, 2022

I want to add onto this issue with additional debugging I have done. It looks like when I try to log in, it is failing in the file authlib/integrations/flask_client/apps.py. It is failing at line 103 which is:

params = self._format_state_params(state_data, params)

Looking into this further, it looks like this calls the _format_state_params method in the file authlib/integrations/base_client/sync_app.py, and in this method it is going in the if statement, which is: if state_data is None

so, in the apps.py file when it does the line: state_data = self.framework.get_state_data(session, params.get('state')) my state is coming in as none. But, when I look at the network debugger tool in chrome, it looks like state is getting added to the urls being sent:
image

So, my question is why is authlib not getting the state when the urls are having the state in them??

@lepture
Copy link
Owner

lepture commented Dec 21, 2022

According to your description, it seems your app's session is not working properly.

  1. Make sure your Flask app has set secure_key
  2. Check if the session works

@ahipp13
Copy link
Author

ahipp13 commented Dec 21, 2022

Hi @lepture and thank you for your response.

I hate to respond back like this, but how exactly would I go about doing these 2 things?

I am not very experienced with authlib as well as FAB. I am just trying to run airflow and a part of airflow is the web UI which is built on FAB and authlib. I have never used authlib outside of this.

I will try to find out where airflow sets up the flask app tomorrow, but any help or guidance about how I would do this, as well as make sure the session is working would be greatly appreciated. Thank you!

@ahipp13
Copy link
Author

ahipp13 commented Dec 21, 2022

@lepture I have found where the app gets created in airflow. The code is down below.

I do not see a "secure_key" getting set, but I do see a "secret_key" being set ( flask_app.secret_key = conf.get('webserver', 'SECRET_KEY') )

So my question is how do I test to see if the session is working??

#
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements.  See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership.  The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License.  You may obtain a copy of the License at
#
#   http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied.  See the License for the
# specific language governing permissions and limitations
# under the License.
from __future__ import annotations

import warnings
from datetime import timedelta
from tempfile import gettempdir

from flask import Flask
from flask_appbuilder import SQLA
from flask_caching import Cache
from flask_wtf.csrf import CSRFProtect
from sqlalchemy.engine.url import make_url

from airflow import settings
from airflow.configuration import conf
from airflow.exceptions import AirflowConfigException, RemovedInAirflow3Warning
from airflow.logging_config import configure_logging
from airflow.models import import_all_models
from airflow.utils.json import AirflowJsonProvider
from airflow.www.extensions.init_appbuilder import init_appbuilder
from airflow.www.extensions.init_appbuilder_links import init_appbuilder_links
from airflow.www.extensions.init_dagbag import init_dagbag
from airflow.www.extensions.init_jinja_globals import init_jinja_globals
from airflow.www.extensions.init_manifest_files import configure_manifest_files
from airflow.www.extensions.init_robots import init_robots
from airflow.www.extensions.init_security import (
    init_api_experimental_auth,
    init_check_user_active,
    init_xframe_protection,
)
from airflow.www.extensions.init_session import init_airflow_session_interface
from airflow.www.extensions.init_views import (
    init_api_connexion,
    init_api_experimental,
    init_appbuilder_views,
    init_connection_form,
    init_error_handlers,
    init_flash_views,
    init_plugins,
)
from airflow.www.extensions.init_wsgi_middlewares import init_wsgi_middleware

app: Flask | None = None

# Initializes at the module level, so plugins can access it.
# See: /docs/plugins.rst
csrf = CSRFProtect()


def sync_appbuilder_roles(flask_app):
    """Sync appbuilder roles to DB"""
    # Garbage collect old permissions/views after they have been modified.
    # Otherwise, when the name of a view or menu is changed, the framework
    # will add the new Views and Menus names to the backend, but will not
    # delete the old ones.
    if conf.getboolean('webserver', 'UPDATE_FAB_PERMS'):
        flask_app.appbuilder.sm.sync_roles()


def create_app(config=None, testing=False):
    """Create a new instance of Airflow WWW app"""
    flask_app = Flask(__name__)
    flask_app.secret_key = conf.get('webserver', 'SECRET_KEY')

    flask_app.config['PERMANENT_SESSION_LIFETIME'] = timedelta(minutes=settings.get_session_lifetime_config())
    flask_app.config.from_pyfile(settings.WEBSERVER_CONFIG, silent=True)
    flask_app.config['APP_NAME'] = conf.get(section="webserver", key="instance_name", fallback="Airflow")
    flask_app.config['TESTING'] = testing
    flask_app.config['SQLALCHEMY_DATABASE_URI'] = conf.get('database', 'SQL_ALCHEMY_CONN')

    url = make_url(flask_app.config['SQLALCHEMY_DATABASE_URI'])
    if url.drivername == 'sqlite' and url.database and not url.database.startswith('/'):
        raise AirflowConfigException(
            f'Cannot use relative path: `{conf.get("database", "SQL_ALCHEMY_CONN")}` to connect to sqlite. '
            'Please use absolute path such as `sqlite:////tmp/airflow.db`.'
        )

    flask_app.config['SQLALCHEMY_TRACK_MODIFICATIONS'] = False

    flask_app.config['SESSION_COOKIE_HTTPONLY'] = True
    flask_app.config['SESSION_COOKIE_SECURE'] = conf.getboolean('webserver', 'COOKIE_SECURE')

    cookie_samesite_config = conf.get('webserver', 'COOKIE_SAMESITE')
    if cookie_samesite_config == "":
        warnings.warn(
            "Old deprecated value found for `cookie_samesite` option in `[webserver]` section. "
            "Using `Lax` instead. Change the value to `Lax` in airflow.cfg to remove this warning.",
            RemovedInAirflow3Warning,
        )
        cookie_samesite_config = "Lax"
    flask_app.config['SESSION_COOKIE_SAMESITE'] = cookie_samesite_config

    if config:
        flask_app.config.from_mapping(config)

    if 'SQLALCHEMY_ENGINE_OPTIONS' not in flask_app.config:
        flask_app.config['SQLALCHEMY_ENGINE_OPTIONS'] = settings.prepare_engine_args()

    # Configure the JSON encoder used by `|tojson` filter from Flask
    flask_app.json_provider_class = AirflowJsonProvider
    flask_app.json = AirflowJsonProvider(flask_app)

    csrf.init_app(flask_app)

    init_wsgi_middleware(flask_app)

    db = SQLA()
    db.session = settings.Session
    db.init_app(flask_app)

    init_dagbag(flask_app)

    init_api_experimental_auth(flask_app)

    init_robots(flask_app)

    cache_config = {'CACHE_TYPE': 'flask_caching.backends.filesystem', 'CACHE_DIR': gettempdir()}
    Cache(app=flask_app, config=cache_config)

    init_flash_views(flask_app)

    configure_logging()
    configure_manifest_files(flask_app)

    import_all_models()

    with flask_app.app_context():
        init_appbuilder(flask_app)

        init_appbuilder_views(flask_app)
        init_appbuilder_links(flask_app)
        init_plugins(flask_app)
        init_connection_form()
        init_error_handlers(flask_app)
        init_api_connexion(flask_app)
        init_api_experimental(flask_app)

        sync_appbuilder_roles(flask_app)

        init_jinja_globals(flask_app)
        init_xframe_protection(flask_app)
        init_airflow_session_interface(flask_app)
        init_check_user_active(flask_app)
    return flask_app


def cached_app(config=None, testing=False):
    """Return cached instance of Airflow WWW app"""
    global app
    if not app:
        app = create_app(config=config, testing=testing)
    return app


def purge_cached_app():
    """Removes the cached version of the app in global state."""
    global app
    app = None

@ahipp13
Copy link
Author

ahipp13 commented Dec 21, 2022

@lepture Have did some more debugging so wanted to add the information I gathered from it.

It looks like this is a problem with the session. I added print statements to print the session in the framework_integration.py, and this is what it looks like:
<SqlAlchemySession {'_permanent': True, '_fresh': False}>

But yet when I try to print the name of the session with: print(session.get("name")) I get "None" as the value.
When it tries to do value = session.get(key) this value is also "None"

So, the session is being created, but it is somehow empty.

So the question is, why would the session be empty, and how do I go about fixing this?

Here is a screenshot of the debugging messages I created within each of these files so you can see the flow it takes.

image

@ahipp13
Copy link
Author

ahipp13 commented Dec 23, 2022

@lepture I have did more debugging, and am back with another update to provide more information on this issue.

Looks like I figured out what is happening but am unsure on how to fix. I have put in a ton of debugging statements to follow the flow of code to see what was happening in the background. But, in interest of keeping this update short, I will just summarize what I found and what I think is happening.

We start in flask_appbuilder/security/views.py in the "def login(self, provider: Optional[str] = None) -> WerkzeugResponse:" function under the "AuthOAuthView(AuthView) class. This function returns a call to another function, "authorize_redirect".

The function" authorize_redirect" is found in authlib/integrations/flask_client/apps.py at line 39. This function does a lot of things, but the most important is at the end when it calls "save_authorize_data". "save_authorize_data" is in the same file, and found at line 32. This function calls a function called "set_state_data". This leads us to authlib/integrations/base_client/framework_integration.py, where it sets the session data and returns.

Before the function "Authorize_redirect" returns back to "views.py", I printed out the session to see what it looked like. This is what is printed:
image

Clearly, you can see here that session has the correct state. So it returns the redirect url, and according to this in the return statement of the "login" function: "redirect_uri=url_for( ".oauth_authorized", provider=provider _external=True)" my guess is that the next place its going is the "oauth_authorized" function within the "views.py" file. So, I put print statements at the top of this file to print out the session, and this is what I get:
image

So, somehow the redirect is causing the session to lose all its data…

I will be investigating on this more Tuesday, but I hope you can help provide insight on how to fix.

@ahipp13
Copy link
Author

ahipp13 commented Jan 5, 2023

@lepture I have been stuck on finding out why the redirect is causing the session to lose all of its data, any help on how to debug and fix this?

@lepture
Copy link
Owner

lepture commented Jan 8, 2023

@ahipp13 You can debug your session issue with:

@app.route('/set-session')
def set_session():
    session['test'] = 'foo'
    return 'set session'

@app.route('/read-session')
def read_session():
    value = session.get('test', '')
    return value
  1. visit /set-session to set the session
  2. open another browser tab, visit /read-session to see if you can get the value

If you can not get the value, your application's session is not configured correctly.

@ahipp13
Copy link
Author

ahipp13 commented Jan 9, 2023

@lepture Where do I need to add this code?

@lepture lepture added question and removed bug labels Jan 9, 2023
@ahipp13
Copy link
Author

ahipp13 commented Jan 9, 2023

@lepture Where do I need to add this code?

@lepture I have figured out where to add this code and have the results from testing. It took me awhile as I had to dig into Airflow's source code to see how they implemented the web app.

I will explain what I edited to get it to work and then my results. What I had to do was edit the views.py file in the airflow source code (airflow/www/views.py). This is what I put in (the class starts at line 561):

class Airflow(AirflowBaseView):
    """Main Airflow application."""
    
    @expose('/set-session')
    def set_session(self):
        flask_session['test'] = 'foo'
        return 'set session'

    @expose('/read-session')
    def read_session(self):
        value = flask_session.get('test','')
        return value   

I then redeployed my application. I went to the 2 urls like you said, and this is the output:

image

image

As you can see, in the other browser I was unable to get the value. So it seems as if the session is no configured correctly. So my next question is, where do I find out where the session is configured, and what is configured wrong??

@lepture
Copy link
Owner

lepture commented Jan 10, 2023

@ahipp13 You need to ask this question from airflow or flask-appbuilder. I'm not familiar with those things.

@lepture lepture closed this as completed Jan 10, 2023
@ahipp13
Copy link
Author

ahipp13 commented Jan 10, 2023

I have finally found what the source of this problem was, as well as the solution.

Doing more debugging, I found that the problem with the session was coming from the webserver_config.py file that I had created. So, what I did was started with as barebones of a Oauth Webserver_config.py file as I could, and kept adding lines until one of them screwed up the session. In doing this I found that it was this line:

PERMANENT_SESSION_LIFETIME = 1800

That was causing the session to not persist. I am not too sure as to why. The reason I had put this in my webserver_config.py file in the first place is because it is in the FAB documentation(https://flask-appbuilder.readthedocs.io/en/latest/security.html#), but in further researching I found that it is already being set in the creation of the flask app by Airflow and is an configuration option: AIRFLOW__WEBSERVER__SESSION_LIFETIME_MINUTES

Now why this worked in Airflow version 2.2.5 and not in Airflow 2.4.3, I am not for sure.

Also through my debugging I found out that FAB natively supports Azure now, so a custom security class and user info handler function is not needed.

So, to answer this question, the solution to my problem was to start using thie webserver_config.py file down below.

from __future__ import annotations
import os
from airflow.www.fab_security.manager import AUTH_OAUTH
# Default Auth Type
# from airflow.www.fab_security.manager import AUTH_DB

basedir = os.path.abspath(os.path.dirname(__file__))

# Flask-WTF flag for CSRF
WTF_CSRF_ENABLED = True

# ----------------------------------------------------
# AUTHENTICATION CONFIG
# ----------------------------------------------------
# For details on how to set up each of the following authentication, see
# http://flask-appbuilder.readthedocs.io/en/latest/security.html# authentication-methods
# for details.

# The authentication type
AUTH_TYPE = AUTH_OAUTH

# registration configs
AUTH_USER_REGISTRATION = True  # allow users who are not already in the FAB DB
AUTH_USER_REGISTRATION_ROLE = "Public"  # this role will be given in addition to any AUTH_ROLES_MAPPING

# Specifying Oauth Providers
OAUTH_PROVIDERS = [
    {
        "name": "azure",
        "icon": "fa-windows",
        "token_key": "access_token",
        "remote_app": {
            "client_id": "APPLICATION_CLIENT_ID",
            "client_secret": "AZURE_DEV_CLIENT_SECRET",
            "api_base_url": "https://login.microsoftonline.com/AZURE_TENANT_ID/oauth2",
            "client_kwargs": {
                "scope": "User.read name preferred_username email profile upn",
                "resource": "APPLICATION_CLIENT_ID",
            },
            "request_token_url": None,
            "access_token_url": "https://login.microsoftonline.com/AZURE_TENANT_ID/oauth2/token",
            "authorize_url": "https://login.microsoftonline.com/AZURE_TENANT_ID/oauth2/authorize",
            'jwks_uri':'https://login.microsoftonline.com/common/discovery/v2.0/keys',
        },
    },
]

# Adding this in because if not the redirect url will start with http and we want https
os.environ["AIRFLOW__WEBSERVER__ENABLE_PROXY_FIX"] = "True"

#This maps our roles set in Azure to the roles in Airflow
AUTH_ROLES_MAPPING = {
    "airflow_dev_admin": ["Admin"],
    "airflow_dev_op": ["Op"],
    "airflow_dev_user": ["User"],
    "airflow_dev_viewer": ["Viewer"]
    }

# # if we should replace ALL the user's roles each login, or only on registration
AUTH_ROLES_SYNC_AT_LOGIN = True

@potiuk
Copy link

potiuk commented Jan 10, 2023

Cool. Yeah The technique when you start from "barebones" and start adding customizations is technique I often suggest users when we are not able to guess it by reviewing. Thanks for doing it and even more thanks for reporting back your investigation results - it might save countless hours for multiple people who might find the solution to their problems by reading this issue.

This is one of the best contributions to the project you can make as a user 🎉

@ahipp13
Copy link
Author

ahipp13 commented Jan 10, 2023

@potiuk Thank you! So currently the session_lifetime_minutes is not working. Is this the only way in Airflow to set the UI to timeout after a certain amount of time? For the airflow app I am working on we want users to be logged out of the UI after 30 minutes, so I am wondering if there is a different way to do this or if we currently cannot do this??

@potiuk
Copy link

potiuk commented Jan 11, 2023

I am quite confused. I understood from your comment that the changes you applied DID solve your problem.

So, to answer this question, the solution to my problem was to start using thie webserver_config.py file down below.

But maybe you have a different issue now. But - regardless - possibly you can check how it behaves in main after the CSRF timout to be the same as session timeout and see if it fixes the issue (pointed out to you in the other thread:" apache/airflow#28730 and it has nothing to do with authlib.

You can even manually apply the same change and see if it fixes the problem.

But if you still have the problem and it does not fix it - then by all means I encourage you to open a new issue - as this thread is now rather confusing whether thing are fixed or not.

@Narender-007
Copy link

when i am using azure authentication in flask app builder getting this error :

Error returning OAuth user info: %s 'upn'
2023-08-01 12:04:40,989:ERROR:flask_appbuilder.security.views:Error returning OAuth user info: 'upn'

if provider == "azure":
    log.debug("Azure response received : {0}".format(resp))
    id_token = resp["id_token"]
    log.debug(str(id_token))
    me = self._azure_jwt_token_parse(id_token)
    log.debug("Parse JWT token : {0}".format(me))
    return {
        "name": me.get("name", ""),
        "email": me["upn"],
        "first_name": me.get("given_name", ""),
        "last_name": me.get("family_name", ""),
        "id": me["oid"],
        "username": me["oid"],
        "role_keys": me.get("roles", []),
    }

i have got jwt token credentials are verified but getting UPN key error

how can i resolve it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants