Skip to content

Conversation

@Chais
Copy link

@Chais Chais commented Nov 10, 2025

closes: #57747

  • Reinstate task_instance_history.id as the table's primary key, given that it was that before the upgrade.
  • Use alembic's autoincrement instead of filling in the values manually. This ensures future entries will get a key, as the code for those versions expects it.

I verified that adding an auto-increment column (INTEGER SERIAL for postgres, int NOT NULL AUTO_INCREMENT for mysql) to a non-empty table backfills the new fields as we did manually.


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in airflow-core/newsfragments.

@Chais Chais requested a review from ephraimbuddy as a code owner November 10, 2025 16:22
@boring-cyborg
Copy link

boring-cyborg bot commented Nov 10, 2025

Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contributors' Guide (https://github.com/apache/airflow/blob/main/contributing-docs/README.rst)
Here are some useful points:

  • Pay attention to the quality of your code (ruff, mypy and type annotations). Our prek-hooks will help you with that.
  • In case of a new feature add useful documentation (in docstrings or in docs/ directory). Adding a new operator? Check this short guide Consider adding an example DAG that shows how users should use it.
  • Consider using Breeze environment for testing locally, it's a heavy docker but it ships with a working Airflow and a lot of integrations.
  • Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
  • Please follow ASF Code of Conduct for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
  • Be sure to read the Airflow Coding style.
  • Always keep your Pull Requests rebased, otherwise your build might fail due to changes not related to your commits.
    Apache Airflow is a community-driven project and together we are making it better 🚀.
    In case of doubts contact the developers at:
    Mailing List: dev@airflow.apache.org
    Slack: https://s.apache.org/airflow-slack

@boring-cyborg boring-cyborg bot added the area:db-migrations PRs with DB migration label Nov 10, 2025
@potiuk potiuk added this to the Airflow 3.1.4 milestone Dec 2, 2025
with op.batch_alter_table("task_instance_history", schema=None) as batch_op:
batch_op.drop_constraint(batch_op.f("task_instance_history_pkey"), type_="primary")
batch_op.add_column(sa.Column("id", sa.INTEGER, nullable=True))
batch_op.add_column(sa.Column("id", sa.INTEGER, primary_key=True, autoincrement=True))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't work with data as id would be null.
That's why we had to run the deleted statements to update the ID column before making it a primary key

Copy link
Author

@Chais Chais Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not convinced this is the case. The downgrade SQL generated for psql looks like this:

BEGIN;
-- Running downgrade 7645189f3479 -> e00344393f31
ALTER TABLE task_instance_history DROP CONSTRAINT task_instance_history_pkey;
ALTER TABLE task_instance_history ADD COLUMN id SERIAL NOT NULL;
ALTER TABLE task_instance_history DROP COLUMN task_instance_id;
ALTER TABLE task_instance_history DROP COLUMN try_id;
UPDATE alembic_version SET version_num='e00344393f31' WHERE alembic_version.version_num = '7645189f3479';
COMMIT;

The relevant line is:

ALTER TABLE task_instance_history ADD COLUMN id SERIAL NOT NULL;

If I now create a test table and fill it with some data:

CREATE TABLE test ( column_a text, column_b integer );
INSERT INTO test (column_a, column_b) VALUES ('some text', 3);
INSERT INTO test (column_a, column_b) VALUES ('some other text', 17);

I see it as expected:

SELECT * FROM test;
    column_a     | column_b 
-----------------+----------
 some text       |        3
 some other text |       17
(2 rows)

If I now apply the relevant change to this table with its existing data like this:

ALTER TABLE test ADD COLUMN id SERIAL NOT NULL;

I can see the values being inserted:

SELECT * FROM test;
    column_a     | column_b | id 
-----------------+----------+----
 some text       |        3 |  1
 some other text |       17 |  2
(2 rows)

So this seems to work as intended on PostgreSQL. I tested it on MySQL when I wrote this PR, with a similar result.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you test with postgresql because according to alembic, autoincrement is only understood by mysql: https://alembic.sqlalchemy.org/en/latest/ops.html

Copy link
Author

@Chais Chais Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had hoped this would make it sufficiently clear that I did:

The downgrade SQL generated for psql looks like this:
[…]
So this seems to work as intended on PostgreSQL.

Technically, the generated code doesn't use AUTO_INCREMENT:

ALTER TABLE test ADD COLUMN id SERIAL NOT NULL;

But as we can see in the PostgreSQL documentation, the SERIAL type is an autoincrementing integer, which is just what we need: https://www.postgresql.org/docs/current/datatype-numeric.html
Maybe Alembic translates sa.INTEGER, autoincrement=True into SERIAL for Postgres.

Also, it's not like I invented this definition, either. task_instance_history was introduced in revision d482b7261ff9 (number 21, by the new count) with the id column defined like this:

sa.Column("id", sa.Integer(), primary_key=True, autoincrement=True),

So clearly, this worked before (and for all supported DBs, I might add) and I see no reason why it shouldn't work again.
And the fact remains, that the current code leaves the DB broken post-downgrade. While it restores the id column and refills it in what may be the most cumbersome way possible, it neglects the autoincrement the codebase expects at the relevant version, leading to queries failing. Please see the issue I linked in the original post for details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:db-migrations PRs with DB migration

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Downgrading DB leaves instance unable to clear tasks

3 participants