Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate Amazon Provider Package's SecretsManagerBackend's full_url_mode=False implementation. #26571

Closed
2 tasks done
dwreeves opened this issue Sep 21, 2022 · 8 comments · Fixed by #27920
Closed
2 tasks done
Labels
area:secrets kind:feature Feature Requests provider:amazon-aws AWS/Amazon - related issues

Comments

@dwreeves
Copy link
Contributor

dwreeves commented Sep 21, 2022

Objective

I am following up on all the changes I've made in PR #25432 and which were originally discussed in issue #25104.

The objective of the deprecations introduced in #25432 is to flag and remove "odd" behaviors in the SecretsManagerBackend.

The objective of this issue being opened is to discuss them, and hopefully reach a consensus on how to move forward implementing the changes.

I realize that a lot of the changes I made and their philosophy were under-discussed, so I will place the discussion here.

What does it mean for a behavior to be "odd"?

You can think of the behaviors of SecretsManagerBackend, and which secret encodings it supports, as a Venn diagram.

Ideally, SecretsManagerBackend supports at least everything the base API supports. This is a pretty straightforward "principle of least astonishment" requirement.

For example, it would be "astonishing" if copy+pasting a secret that works with the base API did not work in the SecretsManagerBackend.

To be clear, it would also be "astonishing" if the reverse were not true-- i.e. copy+pasting a valid secret from SecretsManagerBackend doesn't work with, say, environment variables. That said, adding new functionality is less astonishing than when a subclass doesn't inherit 100% of the supported behaviors of what it is subclassing. So although adding support for new secret encodings is arguably not desirable (all else equal), I think we can all agree it's not as bad as the reverse.

Examples

I will cover two examples where we can see the "Venn diagram" nature of the secrets backend, and how some behaviors that are supported in one implementation are not supported in another:

Example 1

Imagine the following environment variable secret:

export AIRFLOW_CONN_POSTGRES_DEFAULT='{
  "conn_type": "postgres",
  "login": "usr",
  "password": "not%url@encoded",
  "host": "myhost"
}'

Prior to #25432, this was not a secret that worked with the SecretsManagerBackend, even though it did work with base Airflow's EnvironmentVariablesBackend(as of 2.3.0) due to the values not being URL-encoded.

Additionally, the EnvironmentVariablesBackend is smart enough to choose whether a secret should be treated as a JSON or a URI without having to be explicitly told. In a sense, this is also an incompatibility-- why should the EnvironmentVariablesBackend be "smarter" than the SecretsManagerBackend when it comes to discerning JSONs from URIs, and supporting both at the same time rather than needing secrets to be always one type of serialization?

Example 2

Imagine the following environment variable secret:

export AIRFLOW_CONN_POSTGRES_DEFAULT="{
  'conn_type': 'postgres',
  'user': 'usr',
  'pass': 'is%20url%20encoded',
  'host': 'myhost'
}"

This is not a valid secret in Airflow's base EnvironmentVariablesBackend implementation, although it is a valid secret in SecretsManagerBackend.

There are two things that make it invalid in the EnvironmentVariablesBackend but valid in SecretsManagerBackend:

  • ast.litera_eval in SecretsManagerBackend means that a Python dict repr is allowed to be read in.
  • user and pass are invalid field names in the base API; these should be login and password, respectively. But the _standardize_secret_keys() method in the SecretsManagerBackend implementation makes it valid.

Additionally, note that this secret also parses differently in the SecretsManagerBackend than the EnvironmentVariablesBackend: the password "is%20url%20encoded" renders as "is url encoded" in the SecretsManagerBackend, but is left untouched by the base API when using a JSON.

List of odd behaviors

Prior to #25432, the following behaviors were a part of the SecretsManagerBackend specification that I would consider "odd" because they are not part of the base API implementation:

  1. full_url_mode=False still required URL-encoded parameters for JSON values.
  2. ast.literal_eval was used instead of json.dumps, which means that the SecretsManagerBackend on full_url_mode=False was supporting Python dict reprs and other non-JSONs.
  3. The Airflow config required setting full_url_mode=False to determine what is a JSON or URI.
  4. get_conn_value() always must return a URI.
  5. The SecretsManagerBackend allowed for atypical / flexible field names (such as user instead of login) via the _standardize_secret_keys() method.

We also introduced a new odd behavior in order to assist with the migration effort, which is:

  1. New kwarg called are_secret_values_urlencoded to support secrets whose encodings are "non-idempotent".

In the below sections, I discuss each behavior in more detail, and I've added my own opinions about how odd these behaiors are and the estimated adverse impact of removing the behaviors.

Behavior 1: URL-encoding JSON values

Current Status Oddness Estimated Adverse Impact of Removal
Deprecated High High

This was the original behavior that frustrated me and motivated me to open issues + submit PRs.

With the "idempotency" check, I've done my best to smooth out the transition away from URL-encoded JSON values.

The requirement is now mostly removed, to the extent that the removal of this behavior can be backwards compatible and as smooth as possible:

  • Users whose secrets do not contain special characters will not have even noticed a change took place.
  • Users who do have secrets with special characters hopefully are checking their logs and will have seen a deprecation warning telling them to remove the URL-encoding.
  • In a select few rare cases where a user has a secret with a "non-idempotent" encoding, the user will have to reconfigure their backend_kwargs to have are_secret_values_urlencoded set.

I will admit that I was surprised at how smooth we could make the developer experience around migrating this behavior for the majority of use cases.

When you consider...

  • How smooth migrating is (just remove the URL-encoding! In most cases you don't need to do anything else!), and
  • How disruptive full removal of URL-encoding is (to people who have not migrated yet),

.. it makes me almost want to hold off on fully removing this behavior for a little while longer, just to make sure developers read their logs and see the deprecation warning.

Behavior 2: ast.literal_eval for deserializing JSON secrets

Current Status Oddness Estimated Adverse Impact of Removal
Deprecated High Low

It is hard to feel bad for anyone who is adversely impacted by this removal:

  • This behavior should never have been introduced
  • This behavior was never a documented behavior
  • A reasonable and educated user will have known better than to rely on non-JSONs.

Providing a DeprecationWarning for this behavior was already going above and beyond, and we can definitely remove this soon.

Behavior 3: full_url_mode=False is required for JSON secrets

Current Status Oddness Estimated Adverse Impact of Removal
Active Medium Low

This behavior is odd because the base API does not require such a thing-- whether it is a JSON or a URI can be inferred by checking whether the first character of the string is {.

Because it is possible to modify this behavior without introducing breaking changes, moving from lack of optionality for the full_url_mode kwarg can be considered a feature addition.

Ultimately, we would want to switch from full_url_mode: bool = True to full_url_mode: bool | None = None.

In the proposed implementation, when full_url_mode is None, we just use whether the value starts with { to check if it is a JSON. Only when full_url_mode is a bool would we explicitly raise errors e.g. if a JSON was given when full_url_mode=True, or a URI was given when full_url_mode=False.

Behavior 4: get_conn_value() must return URI

Current Status Oddness Estimated Adverse Impact of Removal
Deprecated + Active (until at least October 11th) Low Medium

The idea that the callback invoked by get_connection() (now called get_conn_value(), but previously called get_conn_uri()) can return a JSON is a new Airflow 2.3.0 behavior.

This behavior cannot change until at least October 11th, because it is required for 2.2.0 backwards compatibility. Via Airflow's README.md:

[...] by default we upgrade the minimum version of Airflow supported by providers to 2.3.0 in the first Provider's release after 11th of October 2022 (11th of October 2021 is the date when the first PATCHLEVEL of 2.2 (2.2.0) has been released.

Changing this behavior after October 11th is just a matter of whether maintainers are OK with introduce a breaking change to the 2.2.x folks who are relying on JSON secrets.

Note that right now, get_conn_value() is avoided entirely when full_url_mode=False and get_connection() is called. So although there is a deprecation warning, it is almost never hit.

if self.full_url_mode:
    return self._get_secret(self.connections_prefix, conn_id)
else:
    warnings.warn(
        f'In future versions, `{type(self).__name__}.get_conn_value` will return a JSON string when'
        ' full_url_mode is False, not a URI.',
        DeprecationWarning,
    )

Behavior 5: Flexible field names via _standardize_secret_keys()

Current Status Oddness Estimated Adverse Impact of Removal
Active Medium High

This is one of those things that is very hard to remove. Removing it can be quite disruptive!

It is also a low priority to remove because unlike some other behaviors, it does not detract from SecretsManagerBackend being a "strict superset" with the base API.

Maybe it will just be the case that SecretsManagerBackend has this incompatibility with the base API going forward indefinitely?

Even still, we should consider the two following proposals:

  1. The default field name of user should probably be switched to login, both in the dict[str, list[str]] used to implement the find+replace, and also in the documentation. I do not forsee any issues with doing this.
  2. Remove documentation for this feature if we think it is "odd" enough to warrant discouraging users from seeking it out.

I think # 1 should be uncontroversial, but # 2 may be more controversial. I do not want to detract from my other points by staking out too firm an opinion on this one, so the best solution may simply be to not touch this for now. In fact, not touching this is exactly what I did with the original PR.

Behavior 6: are_secret_values_urlencoded kwarg

Current Status Oddness Estimated Adverse Impact of Removal
Pending Deprecation Medium Medium

In the original discussion #25104, @potiuk told me to add something like this. I tried my best to avoid users needing to do this, hence the "idempotency" check. So only a few users actually need to specify this to assist in the migration of their secrets.

This was introduced as a "pending" deprecation because frankly, it is an odd behavior to have ever been URL-encoding these JSON values, and it only exists as a necessity to aid in migrating secrets. In our ideal end state, this doesn't exist.

Eventually when it comes time, removing this will not be all that disruptive:

  • This only impacts users who have full_url_mode=False
  • This only impacts users with secrets that have non-idempotent encodings.
  • are_secret_values_urlencoded should be set to False. Users should never be manually setting to True!

So we're talking about a small percent of a small minority of users who will ever see or need to set this are_secret_values_urlencoded kwarg. And even then, they should have set are_secret_values_urlencoded to False to assist in migrating.

Proposal for Next Steps

All three steps require breaking changes.

Proposed Step 1

  • Remove: Behavior 2: ast.literal_eval for deserializing JSON secrets
  • Remove: Behavior 3: full_url_mode=False is required for JSON secrets
  • Remove: Behavior 4: get_conn_value() must return URI
  • Note: Must wait until at least October 11th!

Right now the code is frankly a mess. I take some blame for that, as I did introduce the mess. But the mess is all inservice of backwards compatibility.

Removing Behavior 4 vastly simplifies the code, and means we don't need to continue overriding the get_connection() method.

Removing Behavior 2 also simplifies the code, and is a fairly straightforward change.

Removing Behavior 3 is fully backwards compatible (so no deprecation warnings required) and provides a much nicer user experience overall.

The main thing blocking "Proposed Step 1" is the requirement that 2.2.x be supported until at least October 11th.

Alternative to Proposed Step 1

It is possible to remove Behavior 2 and Behavior 3 without removing Behavior 4, and do so in a way that keeps 2.2.x backwards compatibility.

It will still however leave a mess of the code.

I am not sure how eager the Amazon Provider Package maintainers are to keep backwards compatibility here. Between the 1 year window, plus the fact that this can only possibly impact people using both the SecretsManagerBackend and who have full_url_mode=False turned on, it seems like not an incredibly huge deal to just scrap support for 2.2.x here when the time comes. But it is not appropriate for me to decide this, so I must be clear and say that we can start trimming away some of the odd behaviors without breaking Airflow 2.2.x backwards compatibility, and that the main benefit of breaking backwards compatibility is the source code becoming way simpler.

Proposed Step 2

  • Remove: Behavior 1: URL-encoding JSON values
  • Switch status from Pending Deprecation to Deprecation: Behavior 6: are_secret_values_urlencoded kwarg

Personally, I don't think we should rush on this. The reason I think we should take our time here is because the current way this works is surprisingly non-disruptive (i.e. no config changes required to migrate for most users), but fully removing the behavior may be pretty disruptive, especially to people who don't read their logs carefully.

Alternative to Proposed Step 2

The alternative to this step is to combine this step with step 1, instead of holding off for a future date.

The main arguments in favor of combining with step 1 are:

  • Reducing the number of version bumps that introduce breaking changes by simply combining all breaking changes into one step. It's unclear how many users even use full_url_mode and it is arguable that we're being too delicate with what was arguably a semi-experimental and odd feature to begin with; it's only become less experimental by the stroke of luck that Airflow 2.3.0 supports JSON-encoded secrets in the base API.
  • A sort of "rip the BandAid" ethos, or a "get it done and over with" ethos. I don't think this is very nice to users, but I see the appeal of not keeping odd code around for a while.

Proposed Step 3

  • Remove: Behavior 6: are_secret_values_urlencoded kwarg

Once URL-encoding is no longer happening for JSON secrets, and all non-idempotent secrets have been cast or explicitly handled, and we've deprecated everything appropriately, we can finally remove are_secret_values_urlencoded.

Conclusion

The original deprecations introduced were under-discussed, but hopefully now you both know where I was coming from, and also agree with the changes I made.

If you disagree with the deprecations that I introduced, I would also like to hear about that and why, and we can see about rolling any of them back.

Please let me know what you think about the proposed steps for changes to the code base.

Please also let me know what you think an appropriate schedule is for introducing the changes, and whether you think I should consider one of the alternatives (both considered and otherwise) to the steps I outlined in the penultimate section.

Other stuff

Use case/motivation

(See above)

Related issues

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@dwreeves dwreeves added the kind:feature Feature Requests label Sep 21, 2022
@potiuk
Copy link
Member

potiuk commented Sep 22, 2022

That's a lot to digest. Question before anyone dives deeply (I guess this is asking for TL;DR;). I think this is very important to decide who should focus more on it and try to provide some answers/who the consensus should be with.

Q: Does this propoosal make any changes in SecretsBackend interface (common for all the Secret Backends) or can it be done exclusively in the Amazon Provider without changing Secrets Backend API backwards compatibilty?

Clarifying question 1: Does the change require changing anything in any other "community" Secrets Backends?
Clarifying question 2: Does the change require changing anything from someone to implement their own Secret Backends?

Depending on those answers, then the discussion will go though completely different flow.

If the 1/2 answer is - nope, no changes needed, they will continue to work and is fully backwards compatible - then this is purely amazon provider discussion, and @ferruzzi @o-nikolas @vincbeck are probably the best "discutants" here - to decide what is the best approch and whether they are fine with cleaning the code and releasing backwards-incompatible, new major provider version (and when).

Otherwise - I think this calls for a devlist discussion - at least pointing to this GitHub Issue and far bigger group of people need to be involved. Also in this case I am afraid for Airlfow 2 such changes will be impossible because of our SemVer compatibilty - mostly this is a promise to our users that there will be no breaking changes in 2.* line.

And we do not know yet (or even if) we are going to have breaking Airflow 3. There are multiple reasons why we would like to not do it and we even have not opened the discussion yet.

@dwreeves
Copy link
Contributor Author

dwreeves commented Sep 22, 2022

Q: Does this propoosal make any changes in SecretsBackend interface (common for all the Secret Backends) or can it be done exclusively in the Amazon Provider without changing Secrets Backend API backwards compatibilty?

100% of changes I am proposing are in the Amazon provider package's SecretsManagerBackend. Nothing necessitates a change in the base implementation.

(You can think of a lot of these changes I am proposing as ways to make the Amazon SecretsManagerBackend behave more like the base implementation. Right now there are a few nuanced ways in which they differ.)

Clarifying question 1: Does the change require changing anything in any other "community" Secrets Backends?

No.

Clarifying question 2: Does the change require changing anything from someone to implement their own Secret Backends?

There are backwards incompatibilities, strictly speaking. None of these backwards incompatibilities impact the Airflow base implementation whatsoever. Only the provider package.

For example, with the first stage proposal--

  • Behavior 2 is not backwards compatible. But is not even documented and seems to have arisen because of a mistake in the test suite where a contributor did not realize their trailing commas created a non-JSON object, so the contributor believed that json.loads() was an error prone function.
  • Behavior 3 is backwards compatible.
  • Behavior 4 is breaking for 2.2, but a user who is not using the secrets manager in an atypical way would not notice a difference in 2.3 (it is very rare to directly call get_conn_value() or get_conn_uri()). And also this behavior has already been deprecated by Airflow 2.3.

Overall the biggest impact of this is that a user using full_url_mode=False would be forced to upgrade to 2.3. But they would not need to otherwise change anything about their config or code unless they were doing anything very weird and unexpected. And even then, we can avoid this requirement of forcing an upgrade to 2.3 if we want.

Otherwise - I think this calls for a devlist discussion - at least pointing to this GitHub Issue and far bigger group of people need to be involved. Also in this case I am afraid for Airlfow 2 such changes will be impossible because of our SemVer compatibilty - mostly this is a promise to our users that there will be no breaking changes in 2.* line.

To clarify, I am discussing possible breaking changes in the provider package, not base Airflow.

When I say "breaking," I think most of these are ultimately minor changes, I think. They are just strictly speaking breaking, hence why I am noting them as such.

@potiuk
Copy link
Member

potiuk commented Sep 22, 2022

Cool. Then I think most of the comments on those are on the side of AWS people :)

@o-nikolas
Copy link
Contributor

Hey @dwreeves, thanks for this fantastically detailed write-up! Although my head is still spinning a bit, it would be worse if this post wasn't as detailed as it is, so thanks again 😃

Personally, I don't think we should be so afraid of breaking changes, especially ones that affect such small percentages of users. To me it can be more of a risk to maintain complicated code for longer periods of time and also having ambiguous behaviours for longer allows folks to happily use the old features even when they should be moving off of them. And at the end of the day, sem ver versioning does it's job, folks can stick to older versions if they really require these strange old behaviours. If some super valuable bug fix arrives in a future version, we can always think about cherry-picking it backwards using the shared governance model @potiuk outlined earlier this year.

So, I'm happy to combine steps 1 and 2, and after waiting appropriately long enough, to go ahead with step 3. But I'd love to hear what others think as well.

Cheers,
Niko

@potiuk
Copy link
Member

potiuk commented Sep 22, 2022

Brave !

@dwreeves
Copy link
Contributor Author

Sorry it's so much! I think the Examples section is the clearest explanation of what the issues are with the current SecretsManagerBackend implementation.

And yes, it is a small # of users affected. Not the worst. And up-to-date, affected users are all getting deprecation warnings right now. The end result though is I think a much more intuitive SecretsManagerBackend... at the end of it all, a user will be able to just copy+paste any arbitrary secret that works as an environment variable, into the AWS Secrets Manager, and that will be that without any surprises. (This was the challenge I originally had that frustrated me and motivated me to work on this.)

@Taragolis
Copy link
Contributor

@dwreeves very detail explanation 👍 💯 🥇

My thought about ast.literal_eval and full_url_mode argument.

As I understand release after October 11, 2022 would major so it might allow breaking changes in provider in this case we actually might drop support of weird ast.literal_eval if user want keep use dict representation like {'region_name': 'us-east-1'} the only option do not upgrade provider or downgrade to previous version (it might mention in documentation). For user who actually wanted fix their values in AWS SB we might prepare some snippets which help to fix values in SB.

For full_url_mode I have two thoughts.

First. We can get rid of this argument (just do something kwargs.pop("full_url_mode", None) and if value is set just inform users that this argument has no affect anymore). In fact detects automatically is it json or string would allow store for different connections in different formats: JSON and URL-encoded

It could be archived by different ways, it might be something like this sample

import json


def validate(v: str) -> str:
    if not v:
        return v

    v = v.strip()
    try:
        parsed_value = json.loads(v)
    except json.JSONDecodeError as ex:
        if v.startswith("{") or "://" not in v:
            raise ValueError(f"Not valid JSON-Object or URL-encoded string. Original error:\n * {ex}")
    else:
        if not isinstance(parsed_value, dict):
            raise TypeError(f"Expected JSON-object, got {type(v).__name__}.")

    return v


values_from_sb = [
    '{"conn_type": "aws"}',  # JSON Object - Valid
    'aws://',  # String, URL encoded - Valid
    '"aws://"',  # JSON String, Valid JSON object, Invalid Secret Backend
    "{'conn_type': 'aws'}",  # repr of dict, not failed during parsing however it failed on url-unparse
    '{"conn_type": "aws"',  # Invalid JSON object, missing `}`
    '\t\t\t\t\t{"conn_type": "aws"}\n',  # Some whitespaces
]

for value in values_from_sb:
    try:
        print(f"{value!r} is valid value for SB: {validate(value)!r}")
    except Exception as ex:
        print(f"Unable to parse: {value!r}. Original error:\n - {ex}")

Second. It also might be a good idea to add ability to store as JSON not only in SecretsManagerBackend but also in SystemsManagerParameterStoreBackend. As result might be possible to create some base class for AWS Secret Backends

@dwreeves
Copy link
Contributor Author

Busy last few weeks for me, but I'll be getting to this soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:secrets kind:feature Feature Requests provider:amazon-aws AWS/Amazon - related issues
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants