Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SecretsManagerBackend() does not need to require URL-encoded values for full_url_mode=False connections. #25104

Closed
2 tasks done
dwreeves opened this issue Jul 16, 2022 · 3 comments
Assignees
Labels
area:providers kind:bug This is a clearly a bug provider:amazon-aws AWS/Amazon - related issues

Comments

@dwreeves
Copy link
Contributor

dwreeves commented Jul 16, 2022

Apache Airflow Provider(s)

amazon

Versions of Apache Airflow Providers

I am using apache-airflow-providers-amazon==2.4.0, but this issue is still relevant as of the most recent version, apache-airflow-providers-amazon==4.0.0.

Apache Airflow version

2.2.2

Operating System

OS X

Deployment

MWAA

Deployment details

These are my relevant configuration details:

[secrets]
backend = airflow.providers.amazon.aws.secrets.secrets_manager.SecretsManagerBackend
backend_kwargs = {"connections_prefix": "", "full_url_mode": false}

What happened

When full_url_mode=False, values still need to be URL-encoded.

This creates a break in behavior between how the Airflow UI works and how SecretsManagerBackend() works. When I was porting over my config from the Airflow UI to the AWS Secrets Manager, I could not literally copy+paste my config as a JSON, otherwise this resulted in an error. It was also unclear and surprising that it works this aay.

What you think should happen instead

This behavior makes sense for full_url_mode=True in order to ensure that the URL serialization can be parsed properly.

However, this behavior is not actually required for when full_url_mode=False because the quotes in the JSON delimit the values, which is what one would reasonably expect to be happening behind the scenes.

This is why I consider this a bug, or at least bug-like:

  • it's unexpected behavior (because it's not a URL, it's a JSON)
  • It deviates from how a core implementation works (i.e. the Airflow UI)
  • It is not required to behave like this.

How to reproduce

The way I struggled with this was in creating a Slack hook. The Slack hook takes an input something like this:

{
  "conn_type": "http",
  "host": "https://hooks.slack.com/services",
  "password": "/path/to/hook"
}

This works fine using the Airflow UI without URL-encoding, however it is necessary to URL encode everything before passing it through the SecretsManagerBackend().

image

Anything else

The SecretsManagerBackend uses the BaseSecretsBackend implementation of the get_connection() method. This method takes a conn_id: str and returns an Optional[Connection].

Possible Airflow 2.3 implementation

The base implementation gets some value from the method self.get_conn_value(conn_id=conn_id), then passes it to self.deserialize_connection(conn_id=conn_id, value=value). This method accepts either a URL or a JSON (a string that starts with {).

However, the get_conn_value() method of the SecretsManagerBackend never returns a JSON. Even if the value it retrieves from the cloud is a JSON, it will stick it into a URL at first. At least as of Airflow 2.3.0, the SecretManagerBackend behavior of avoiding JSONs is unnecessary as get_conn_value() is allowed to return a JSON, and the base implementation of get_conn_value`.

Possible Airflow 2.2 implementation

The Apache Airflow Amazon provider package supports Airflow 2.2. Supporting this version of Airflow is also possible, but is slightly more involved: the SecretsManagerBackend would need to override the base implementation for get_connection such that the JSON kwargs are passed into Connection.__init__. (Without overridding this method, only the uri=? kwarg is used to construct the URL).

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@dwreeves dwreeves added area:providers kind:bug This is a clearly a bug labels Jul 16, 2022
@potiuk
Copy link
Member

potiuk commented Jul 16, 2022

Agree it would be nice to improve it - feel free to create PR on it when we can iterate on possble solutions.

I think it is important to keep backwards compatibility and do not go 2.3+. There must be a very good reason to break compatibility before the time, unless you want to wait till 11th October:

https://github.com/apache/airflow#release-process-for-providers

For example this means that by default we upgrade the minimum version of Airflow supported by providers to 2.3.0 in the first Provider's release after 11th of October 2022 (11th of October 2021 is the date when the first PATCHLEVEL of 2.2 (2.2.0) has been released.

Also I think there is another aspect here. We should make it in the way that:

a) there is a parameter needed to determine if the parameters are url-encoded or not
b) default should be true (but it should raise a deprecation warning).

Otherwise we are risking that if someone configured it with URL-encoding (as cumbersome as it might see - you finally did it too) and if we just change the behaviour, we should not break experience of those people - otherwise upgrading AWS provider might mysteriously make their DAGs break.

@dwreeves
Copy link
Contributor Author

Sorry for radio silence. I should have a first pass at a PR done later today, so long as I don't uncover any bugs as I flesh out the test cases.

It was tricky to maintain backwards compatibility, make the migration as smooth as possible for users, and make sure the changes can off-ramp into a more sane API down the line. But I think I got something. :)

@eladkal
Copy link
Contributor

eladkal commented Aug 13, 2022

Fixed in #25432

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:providers kind:bug This is a clearly a bug provider:amazon-aws AWS/Amazon - related issues
Projects
None yet
Development

No branches or pull requests

3 participants