-
Notifications
You must be signed in to change notification settings - Fork 14.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow providers to add default connections #32048
Comments
I am not sure if this is really needed. As discussed in #31533 (comment) There are a a number of things that we should be aware of:
The objective of the In the ideal world people will not have those connections created in production and we had absolutely no intention to synchronize those connections with anything. There are a number of problems and considerations to solve if we want to do any kind of any attempt of production-level synchronization. All of them are described and extensively discussed here: #31875 I think the root cause of the problem is the IMHO we should continue the path that the |
How do you see |
This is already happening. You do not need to run init, for the connection to appear in the Connection drop-down list. One has nothing to do with the other. You can still manually create the "default" connection with the right type just after installing provider. |
You have to remember that if you follow the advice:
you have to already today manually create the default connections that you need after installation. They won't and should not be created automatically. In fact the advice above specifically mentions that creating the default connections automatically is wrong and you should manually create them when you need a connection of a given type. |
I agree. Line 116 in 00e1056
|
We could get provider to contribute it, yes. It could be exposed via get_provider_info and ready via ProvidersManager -once we know what would be the recommended use pattern for those. |
The tasks needs to be done are:
2a. If the result of this discussion is that there is no value in it then the action item is to deprecate 2b. If the result of this discussion is that there is value then what we should discuss is: This issue will be considered completed when either (2a) or (2b) is completed. |
I don't think we really need it, because:
This command init the database if it dosen't exist, upgrade it if exist, load the default pools and synchronize |
That's - unfortunately - pretty breaking change. We can most likely deprecate init and raise warning and rename it to smth else, but changing it's behaviour would be a breaking change. There are a number of cases where people have the workflows where they depend on 'db init' behaviour and they rely on 'default_connections' being created. This is unfortunate historical thing, but unfortunatley I see that there is quite high probability we will break someone's workflow - generally outside of the "Hyrum's law" edge case. But my proposal above is that we do not have to "fix" its behaviour for providers - the connection will not be automatically created when we install new providers, and we do not have to care about back compatibility there, because it never worked this way. But we can deprecate it and discurage even more rather than break things that are broken even more (And clarify it in the docs, deprecation, and renaming the command). And yes - I think we do not HAVE to change the behaviour of having the connections in the core. While we might not like the fact that adding new provider means new "default" conneciton created in the core, we do not have to change it. One of the options (if we agree this should be treated as a "quick/test only" solution) is to leave those connections in the core. We will have to anyway leave the list of extras in the core. There is no escape from the fact that we will have to list the providers we can install by having an extra - this is already one thing that is impossible to remove from the core. We need to have a list of providers in the core that we have extras for. currently it is done by generating (via pre-commit) 'generated/provider_dependencies.json') - and it used currently as the "source of the truth" for list of extras apache airflow.. If we agree that "default connections" created in the db is a purely test setting and change the perception of that, we can leave them in the core - we can even change them to also be read and configured from a json core, or even add an entry in "generated/provider_dependencies.json" and create them based on it (as long as we make that file part of the airflow package). That only reinforces my " let's clarify what "default_connections" are - are they test only? useful for prod? - what the 'db init' is for (and rename it if we agree 'db init' should not be used to initialize the db). But I think we cannot (Without bumping Airflow 2 to Airflow 3) change the way "airflow db init" behaves by default (at most it can raise a warning) |
Yes I see that this can affect some users.
I think we can deprecate just the part which create the connections (not all the init method), ask the users to set it to False manually, and create a new alternative command to create the default connections. Something similar to diff --git a/airflow/cli/cli_config.py b/airflow/cli/cli_config.py
index 0c69571fea..7811662841 100644
--- a/airflow/cli/cli_config.py
+++ b/airflow/cli/cli_config.py
@@ -1566,6 +1566,12 @@ DB_COMMANDS = (
func=lazy_load_command("airflow.cli.commands.db_command.initdb"),
args=(ARG_VERBOSE,),
),
+ ActionCommand(
+ name="create-default-connections",
+ help="Create default connections",
+ func=lazy_load_command("airflow.cli.commands.db_command.create_default_connections"),
+ args=(ARG_VERBOSE,),
+ ),
ActionCommand(
name="check-migrations",
help="Check if migration have finished",
diff --git a/airflow/cli/commands/db_command.py b/airflow/cli/commands/db_command.py
index 64d54cc22e..db044c0ee7 100644
--- a/airflow/cli/commands/db_command.py
+++ b/airflow/cli/commands/db_command.py
@@ -246,3 +246,8 @@ def drop_archived(args):
table_names=args.tables,
needs_confirm=not args.yes,
)
+
+
+def create_default_connections(args):
+ """Create default connections"""
+ db.create_default_connections()
diff --git a/airflow/utils/db.py b/airflow/utils/db.py
index a76f0d4f67..17236c6932 100644
--- a/airflow/utils/db.py
+++ b/airflow/utils/db.py
@@ -732,6 +732,13 @@ def initdb(session: Session = NEW_SESSION, load_connections: bool = True):
_create_db_from_orm(session=session)
# Load default connections
if conf.getboolean("database", "LOAD_DEFAULT_CONNECTIONS") and load_connections:
+ warnings.warn(
+ "Creating default connections via the command `airflow db init` is deprecated.\n"
+ "Please use `airflow db create-default-connections` instead and "
+ "set database.load_default_connections to False.",
+ DeprecationWarning,
+ stacklevel=2,
+ )
create_default_connections(session=session)
# Add default pool & sync log_template
add_default_pool_if_not_exists(session=session) Then we can remove it after 2 or 3 minor versions if the Airflow core release policy allows us to do like that, otherwise in Airflow 3 as you said. But if the command |
How do we decide which providers would have default connections out of the box or in the past, since I don't see it for all of them? were there any criteria to be met? Also, I think we need to document this behavior as well, as discussed in comment. |
It's historical. We have no rules. |
But yes. I am all for clarifying the behaviour, getting It would be great if someone leads it and proposes a solution/approves and fixes it :). |
Created #32420 to handle the This issue will focus on the mechanism of adding connections to the default connection list (in other words what the |
Currently, (a) This command init the database if it dosen't exist, upgrade it if exist, load the default pools and synchronizes log_filename_template with the db We all agree to remove the (b) part of the Also, there are some suggestions to remove IMO it is best to leave Let me know what you guys think so that we can proceed with #32420 |
This is what we want but it can happen only in Airflow 3. |
Ya, we can only do this in a major release. Also, would it make sense to change the behavior of |
Yes. We can't really change db init behaviour. But we can do something else. We can deprecate (and hide) and yes - create-default-connecrions should only eventually create connections for the installed providers. This is what the follow-up should be about. So maybe indeed it should all be done as a single step (or at least we have to make sure that 'add-default-connections is never released in the state wher it would all all connections independently which providers are installed |
Since #32420 has been addressed, its time for the followup. I think it is better to get some suggestions from mailing list discussions? Let me know what you guys think. |
I think it's quite
|
Thanks @potiuk for the elaborated steps. Will create the PR soon. |
FY: #33909 I've added more of a "meta" Issue describing our goal about "splitting providers to subfolders" (for now - without starting discussion about splitting them out from the repository yet) . This might help to understand how this and few other issues fit-in when it comes to a "long-term" target we have. |
Body
Currently we can add default connections only by adding entry to:
airflow/airflow/utils/db.py
Line 116 in 00e1056
This means only Airflow core release can add the new default connection which is not in sync with providers.
Examples:
#31533 (comment)
#31787 (comment)
This is a pain because providers support earlier Airflow version (for example right now the minimum Airlfow version supported by providers is Airflow 2.4.0 so even if we add new connection to core and release it with Airflow 2.7.0 the users of 2.4.0 won't have it)
The task:
We need somehow to make it so that providers can add default connections (and extract the existed default provider connections from core)
Committer
The text was updated successfully, but these errors were encountered: