Skip to content

Conversation

@nickstenning
Copy link
Contributor

@nickstenning nickstenning commented Dec 8, 2025

To allow airflow db migrate to respect its -r option when run against an empty database, we need to supply an initial migration which creates the database in the state expected by the existing 0001 migration.

This commit:

  1. Adds such an initial migration.
  2. Updates upgradedb to always run a migration-based upgrade rather than a direct reflection of table metadata when to_revision is supplied.
  3. Fix add_default_pool_if_not_exists to not assume that it's running with the latest database schema.

My primary motivation here is to make it easy to prepare an Airflow database with a schema at any point in the migration history, for the purposes of testing and improving migrations. This specifically came up in the context of identifying that the sqlite dialect migration for 0042_3_0_0_add_uuid_primary_key_to_task_instance_.py is quite broken.

I'm not entirely clear on what the status is of the ab_* tables, and whether they should or should not be included here. The difficulty of omitting them altogether is that they are referenced -- albeit only once -- by subsequent migrations: see migration 0028. After discussing with @ashb I've left them in for now, as this a) simplifies handling migration 0028, and b) more accurately reflects the state the database would be in if it had truly been migrated up from 2.x.

To assess whether this base migration results in the same end result as simply running initdb, I've been running a script something like this to print the schema as seen by sqlalchemy in a canonicalized order:

Script to print a canonicalized schema: print_tables.py
from airflow.models import import_all_models
from airflow.utils.db import reflect_tables
from airflow.utils.session import create_session


def print_metadata_canonical(metadata):
    for table_name in sorted(metadata.tables.keys()):
        table = metadata.tables[table_name]
        print(f"\nTable: {table_name}")
        print(f"  Schema: {table.schema}")

        for col_name in sorted(table.columns.keys()):
            col = table.columns[col_name]
            nullable = "NULL" if col.nullable else "NOT NULL"
            default = f" DEFAULT {col.default}" if col.default else ""
            print(f"  Column: {col_name} {col.type} {nullable}{default}")

        if table.primary_key:
            pk_cols = sorted([c.name for c in table.primary_key.columns])
            print(f"  Primary Key: {pk_cols}")

        for constraint in sorted(table.constraints, key=lambda x: x.name or ""):
            if constraint.__class__.__name__ == "UniqueConstraint":
                cols = sorted([c.name for c in constraint.columns])
                print(f"  Unique: {constraint.name} on {cols}")

        for constraint in sorted(table.constraints, key=lambda x: x.name or ""):
            if constraint.__class__.__name__ == "CheckConstraint":
                print(f"  Check: {constraint.name} ({constraint.sqltext})")

        for fk in sorted(table.foreign_keys, key=lambda x: (x.parent.name, str(x.column))):
            print(f"  FK: {fk.parent.name} -> {fk.column}")

        for idx in sorted(table.indexes, key=lambda x: x.name or ""):
            cols = [c.name for c in idx.columns]
            unique = "UNIQUE " if idx.unique else ""
            print(f"  Index: {unique}{idx.name} on {cols}")


if __name__ == "__main__":
    import_all_models()

    with create_session() as s:
        metadata = reflect_tables(session=s, tables=None)

        print_metadata_canonical(metadata)

Here is the diff when running this on main:

$ AIRFLOW__DATABASE__SQL_ALCHEMY_CONN=sqlite:////tmp/test.db uv run airflow db migrate -r heads
[...]
$ AIRFLOW__DATABASE__SQL_ALCHEMY_CONN=sqlite:////tmp/test.db uv run print_tables.py > migrated
$ AIRFLOW__DATABASE__SQL_ALCHEMY_CONN=sqlite:////tmp/test-initdb.db uv run airflow db migrate
[...]
$ AIRFLOW__DATABASE__SQL_ALCHEMY_CONN=sqlite:////tmp/test-initdb.db uv run print_tables.py > initdb
$ diff -u initdb migrated
--- initdb          2025-12-09 12:56:11
+++ migrated        2025-12-09 12:55:53
@@ -1,4 +1,102 @@

+Table: _xcom_archive
+  Schema: None
+  Column: dag_id VARCHAR(250) NOT NULL
+  Column: dag_run_id INTEGER NOT NULL
+  Column: key VARCHAR(512) NOT NULL
+  Column: map_index INTEGER NOT NULL
+  Column: run_id VARCHAR(250) NOT NULL
+  Column: task_id VARCHAR(250) NOT NULL
+  Column: timestamp TIMESTAMP NOT NULL
+  Column: value BLOB NULL
+  Primary Key: ['dag_run_id', 'key', 'map_index', 'task_id']
+
+Table: ab_permission
+  Schema: None
+  Column: id INTEGER NOT NULL
+  Column: name VARCHAR(100) NOT NULL
+  Primary Key: ['id']
+  Unique: ab_permission_name_uq on ['name']
+
+Table: ab_permission_view
+  Schema: None
+  Column: id INTEGER NOT NULL
+  Column: permission_id INTEGER NULL
+  Column: view_menu_id INTEGER NULL
+  Primary Key: ['id']
+  Unique: ab_permission_view_permission_id_view_menu_id_uq on ['permission_id', 'view_menu_id']
+  FK: permission_id -> ab_permission.id
+  FK: view_menu_id -> ab_view_menu.id
+
+Table: ab_permission_view_role
+  Schema: None
+  Column: id INTEGER NOT NULL
+  Column: permission_view_id INTEGER NULL
+  Column: role_id INTEGER NULL
+  Primary Key: ['id']
+  Unique: ab_permission_view_role_permission_view_id_role_id_uq on ['permission_view_id', 'role_id']
+  FK: permission_view_id -> ab_permission_view.id
+  FK: role_id -> ab_role.id
+
+Table: ab_register_user
+  Schema: None
+  Column: email VARCHAR(512) NOT NULL
+  Column: first_name VARCHAR(256) NOT NULL
+  Column: id INTEGER NOT NULL
+  Column: last_name VARCHAR(256) NOT NULL
+  Column: password VARCHAR(256) NULL
+  Column: registration_date DATETIME NULL
+  Column: registration_hash VARCHAR(256) NULL
+  Column: username VARCHAR(512) NOT NULL
+  Primary Key: ['id']
+  Unique: ab_register_user_username_uq on ['username']
+
+Table: ab_role
+  Schema: None
+  Column: id INTEGER NOT NULL
+  Column: name VARCHAR(64) NOT NULL
+  Primary Key: ['id']
+  Unique: ab_role_name_uq on ['name']
+
+Table: ab_user
+  Schema: None
+  Column: active BOOLEAN NULL
+  Column: changed_by_fk INTEGER NULL
+  Column: changed_on DATETIME NULL
+  Column: created_by_fk INTEGER NULL
+  Column: created_on DATETIME NULL
+  Column: email VARCHAR(512) NOT NULL
+  Column: fail_login_count INTEGER NULL
+  Column: first_name VARCHAR(256) NOT NULL
+  Column: id INTEGER NOT NULL
+  Column: last_login DATETIME NULL
+  Column: last_name VARCHAR(256) NOT NULL
+  Column: login_count INTEGER NULL
+  Column: password VARCHAR(256) NULL
+  Column: username VARCHAR(512) NOT NULL
+  Primary Key: ['id']
+  Unique: ab_user_email_uq on ['email']
+  Unique: ab_user_username_uq on ['username']
+  FK: changed_by_fk -> ab_user.id
+  FK: created_by_fk -> ab_user.id
+
+Table: ab_user_role
+  Schema: None
+  Column: id INTEGER NOT NULL
+  Column: role_id INTEGER NULL
+  Column: user_id INTEGER NULL
+  Primary Key: ['id']
+  Unique: ab_user_role_user_id_role_id_uq on ['role_id', 'user_id']
+  FK: role_id -> ab_role.id
+  FK: user_id -> ab_user.id
+
+Table: ab_view_menu
+  Schema: None
+  Column: id INTEGER NOT NULL
+  Column: name VARCHAR(250) NOT NULL
+  Primary Key: ['id']
+  Unique: ab_view_menu_name_uq on ['name']
+
 Table: alembic_version
   Schema: None
   Column: version_num VARCHAR(32) NOT NULL
@@ -535,11 +633,6 @@
   Unique: slot_pool_pool_uq on ['pool']
   FK: team_id -> team.id

-Table: sqlite_sequence
-  Schema: None
-  Column: name NULL NULL
-  Column: seq NULL NULL
-
 Table: task_inlet_asset_reference
   Schema: None
   Column: asset_id INTEGER NOT NULL

I'd greatly appreciate any feedback on whether this is wanted/helpful -- my view is that the -r option to db migrate should always be respected -- and if so, what if any changes are needed for the handling of the mentioned tables.

@boring-cyborg boring-cyborg bot added the area:db-migrations PRs with DB migration label Dec 8, 2025
@ashb ashb added the full tests needed We need to run full set of tests for this PR to merge label Dec 8, 2025
@nickstenning nickstenning force-pushed the add-initial-squashed-migration branch 4 times, most recently from b7e5f10 to be28093 Compare December 9, 2025 12:01
@nickstenning nickstenning force-pushed the add-initial-squashed-migration branch 3 times, most recently from 8c9a7a9 to 8fc2d52 Compare December 9, 2025 12:22
Copy link
Member

@ashb ashb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think editing the file for the FAB migration (from 0001 to 0000) is fine, especially as that too is also creating the "base state" with IF NOT EXISTS on all tables..

@nickstenning nickstenning force-pushed the add-initial-squashed-migration branch from 8fc2d52 to d0f65cf Compare December 9, 2025 12:48
To allow `airflow db migrate` to respect its `-r` option when run
against an empty database, we need to supply an initial migration which
creates the database in the state expected by the existing 0001
migration.

This commit:

1. Adds such an initial migration.
2. Updates `upgradedb` to run a migration-based upgrade rather than a
   direct reflection of table metadata when `to_revision` is supplied.
3. Fix `add_default_pool_if_not_exists` to not assume that it's running
   with the latest database schema.
@nickstenning nickstenning force-pushed the add-initial-squashed-migration branch from d0f65cf to 265080b Compare December 9, 2025 13:42
@ashb ashb merged commit 42d82cb into apache:main Dec 9, 2025
121 checks passed
@nickstenning nickstenning deleted the add-initial-squashed-migration branch December 9, 2025 14:41
henry3260 pushed a commit to henry3260/airflow that referenced this pull request Dec 10, 2025
To allow `airflow db migrate` to respect its `-r` option when run
against an empty database, we need to supply an initial migration which
creates the database in the state expected by the existing 0001
migration.

This commit:

1. Adds such an initial migration.
2. Updates `upgradedb` to run a migration-based upgrade rather than a
   direct reflection of table metadata when `to_revision` is supplied.
3. Fix `add_default_pool_if_not_exists` to not assume that it's running
   with the latest database schema.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:db-migrations PRs with DB migration full tests needed We need to run full set of tests for this PR to merge

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants