Skip to content

Commit

Permalink
chore(OAuth2): refactor for custom OAuth2 clients (#27880)
Browse files Browse the repository at this point in the history
  • Loading branch information
betodealmeida authored Apr 5, 2024
1 parent 62433c1 commit 9377227
Show file tree
Hide file tree
Showing 22 changed files with 382 additions and 294 deletions.
2 changes: 1 addition & 1 deletion superset/common/query_object.py
Original file line number Diff line number Diff line change
Expand Up @@ -108,7 +108,7 @@ class QueryObject: # pylint: disable=too-many-instance-attributes
time_range: str | None
to_dttm: datetime | None

def __init__( # pylint: disable=too-many-arguments,too-many-locals
def __init__( # pylint: disable=too-many-locals, too-many-arguments
self,
*,
annotation_layers: list[dict[str, Any]] | None = None,
Expand Down
20 changes: 15 additions & 5 deletions superset/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -1409,12 +1409,20 @@ def EMAIL_HEADER_MUTATOR( # pylint: disable=invalid-name,unused-argument

# Details needed for databases that allows user to authenticate using personal
# OAuth2 tokens. See https://github.com/apache/superset/issues/20300 for more
# information
DATABASE_OAUTH2_CREDENTIALS: dict[str, dict[str, Any]] = {
# information. The scope and URIs are optional.
DATABASE_OAUTH2_CLIENTS: dict[str, dict[str, Any]] = {
# "Google Sheets": {
# "CLIENT_ID": "XXX.apps.googleusercontent.com",
# "CLIENT_SECRET": "GOCSPX-YYY",
# "BASEURL": "https://accounts.google.com/o/oauth2/v2/auth",
# "id": "XXX.apps.googleusercontent.com",
# "secret": "GOCSPX-YYY",
# "scope": " ".join(
# [
# "https://www.googleapis.com/auth/drive.readonly",
# "https://www.googleapis.com/auth/spreadsheets",
# "https://spreadsheets.google.com/feeds",
# ]
# ),
# "authorization_request_uri": "https://accounts.google.com/o/oauth2/v2/auth",
# "token_request_uri": "https://oauth2.googleapis.com/token",
# },
}
# OAuth2 state is encoded in a JWT using the alogorithm below.
Expand All @@ -1425,6 +1433,8 @@ def EMAIL_HEADER_MUTATOR( # pylint: disable=invalid-name,unused-argument
# applications. In that case, the proxy can forward the request to the correct instance
# by looking at the `default_redirect_uri` attribute in the OAuth2 state object.
# DATABASE_OAUTH2_REDIRECT_URI = "http://localhost:8088/api/v1/database/oauth2/"
# Timeout when fetching access and refresh tokens.
DATABASE_OAUTH2_TIMEOUT = timedelta(seconds=30)

# Enable/disable CSP warning
CONTENT_SECURITY_POLICY_WARNING = True
Expand Down
2 changes: 1 addition & 1 deletion superset/connectors/sqla/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -145,7 +145,7 @@ def get_columns_description(
cursor = conn.cursor()
query = database.apply_limit_to_sql(query, limit=1)
cursor.execute(query)
db_engine_spec.execute(cursor, query, database.id)
db_engine_spec.execute(cursor, query, database)
result = db_engine_spec.fetch_data(cursor, limit=1)
result_set = SupersetResultSet(result, cursor.description, db_engine_spec)
return result_set.columns
Expand Down
6 changes: 5 additions & 1 deletion superset/databases/api.py
Original file line number Diff line number Diff line change
Expand Up @@ -1115,9 +1115,13 @@ def oauth2(self) -> FlaskResponse:
if database is None:
return self.response_404()

oauth2_config = database.get_oauth2_config()
if oauth2_config is None:
raise OAuth2Error("No configuration found for OAuth2")

token_response = database.db_engine_spec.get_oauth2_token(
oauth2_config,
parameters["code"],
state,
)

# delete old tokens
Expand Down
62 changes: 25 additions & 37 deletions superset/db_engine_specs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -547,65 +547,53 @@ Alternatively, it's also possible to impersonate users by implementing the `upda

Support for authenticating to a database using personal OAuth2 access tokens was introduced in [SIP-85](https://github.com/apache/superset/issues/20300). The Google Sheets DB engine spec is the reference implementation.

To add support for OAuth2 to a DB engine spec, the following attribute and methods are needed:
Note that this API is still experimental and evolving quickly, subject to breaking changes. Currently, to add support for OAuth2 to a DB engine spec, the following attributes are needed:

```python
class BaseEngineSpec:

supports_oauth2 = True
oauth2_exception = OAuth2RedirectError

@staticmethod
def is_oauth2_enabled() -> bool:
return False

@staticmethod
def get_oauth2_authorization_uri(state: OAuth2State) -> str:
raise NotImplementedError()

@staticmethod
def get_oauth2_token(code: str, state: OAuth2State) -> OAuth2TokenResponse:
raise NotImplementedError()

@staticmethod
def get_oauth2_fresh_token(refresh_token: str) -> OAuth2TokenResponse:
raise NotImplementedError()
oauth2_scope = " ".join([
"https://example.org/scope1",
"https://example.org/scope2",
])
oauth2_authorization_request_uri = "https://example.org/authorize"
oauth2_token_request_uri = "https://example.org/token"
```

The `oauth2_exception` is an exception that is raised by `cursor.execute` when OAuth2 is needed. This will start the OAuth2 dance when `BaseEngineSpec.execute` is called, by returning the custom error `OAUTH2_REDIRECT` to the frontend. If the database driver doesn't have a specific exception, it might be necessary to overload the `execute` method in the DB engine spec, so that the `BaseEngineSpec.start_oauth2_dance` method gets called whenever OAuth2 is needed.

The first method, `is_oauth2_enabled`, is used to inform if the database supports OAuth2. This can be dynamic; for example, the Google Sheets DB engine spec checks if the Superset configuration has the necessary section:

```python
from flask import current_app

The DB engine should implement logic in either `get_url_for_impersonation` or `update_impersonation_config` to update the connection with the personal access token. See the Google Sheets DB engine spec for a reference implementation.

class GSheetsEngineSpec(ShillelaghEngineSpec):
@staticmethod
def is_oauth2_enabled() -> bool:
return "Google Sheets" in current_app.config["DATABASE_OAUTH2_CREDENTIALS"]
```

Where the configuration for OAuth2 would look like this:
Currently OAuth2 needs to be configured at the DB engine spec level, ie, with one client for each DB engien spec. The configuration lives in `superset_config.py`:

```python
# superset_config.py
DATABASE_OAUTH2_CREDENTIALS = {
DATABASE_OAUTH2_CLIENTS = {
"Google Sheets": {
"CLIENT_ID": "XXX.apps.googleusercontent.com",
"CLIENT_SECRET": "GOCSPX-YYY",
"id": "XXX.apps.googleusercontent.com",
"secret": "GOCSPX-YYY",
"scope": " ".join(
[
"https://www.googleapis.com/auth/drive.readonly",
"https://www.googleapis.com/auth/spreadsheets",
"https://spreadsheets.google.com/feeds",
],
),
"authorization_request_uri": "https://accounts.google.com/o/oauth2/v2/auth",
"token_request_uri": "https://oauth2.googleapis.com/token",
},
}
DATABASE_OAUTH2_JWT_ALGORITHM = "HS256"
DATABASE_OAUTH2_REDIRECT_URI = "http://localhost:8088/api/v1/database/oauth2/"
DATABASE_OAUTH2_TIMEOUT = timedelta(seconds=30)
```

The second method, `get_oauth2_authorization_uri`, is responsible for building the URL where the user is sent to initiate OAuth2. This method receives a `state`. The state is an encoded JWT that is passed to the OAuth2 provider, and is received unmodified when the user is redirected back to Superset. The default state contains the user ID and the database ID, so that Superset can know where to store the received OAuth2 tokens.

Additionally, the state also contains a `tab_id`, which is a random UUID4 used as a shared secret for communication between browser tabs. When OAuth2 starts, Superset will open a new browser tab, where the user will grant permissions to Superset. When authentication is complete and successful this opened tab will send a message to the original tab, so that the original query can be re-run. The `tab_id` is sent by the opened tab and verified by the original tab to prevent malicious messages from other sites. As an additional security measure the origin of the message should match the OAuth2 redirect URL.

State also contains a `defaul_redirect_uri`, which is the enpoint in Supeset that receives the tokens from the OAuth2 provider (`/api/v1/database/oauth2/`). The redirect URL can be overwritten in the config file via the `DATABASE_OAUTH2_REDIRECT_URI` parameter. This might be useful where you have multiple Superset instances. Since the OAuth2 provider requires the redirect URL to be registered a priori, it might be easier (or needed) to register a single URL for a proxy service; the proxy service can then inspect the JWT and redirect the request to `defaul_redirect_uri`.
When configuring a client only the ID and secret are required; the DB engine spec should have default values for the scope and endpoints. The `DATABASE_OAUTH2_REDIRECT_URI` attribute is optional, and defaults to `/api/v1/databases/oauth2/` in Superset.

Finally, `get_oauth2_token` and `get_oauth2_fresh_token` are used to actually retrieve a token and refresh an expired token, respectively.
In the future we plan to support adding custom clients via the Superset UI, and being able to manually assign clients to specific databases.

### File upload

Expand Down
Loading

0 comments on commit 9377227

Please sign in to comment.