-
Notifications
You must be signed in to change notification settings - Fork 16.3k
Reinstate tih.id as pkey and auto-increment on downgrade #58149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contributors' Guide (https://github.com/apache/airflow/blob/main/contributing-docs/README.rst)
|
| with op.batch_alter_table("task_instance_history", schema=None) as batch_op: | ||
| batch_op.drop_constraint(batch_op.f("task_instance_history_pkey"), type_="primary") | ||
| batch_op.add_column(sa.Column("id", sa.INTEGER, nullable=True)) | ||
| batch_op.add_column(sa.Column("id", sa.INTEGER, primary_key=True, autoincrement=True)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This won't work with data as id would be null.
That's why we had to run the deleted statements to update the ID column before making it a primary key
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not convinced this is the case. The downgrade SQL generated for psql looks like this:
BEGIN;
-- Running downgrade 7645189f3479 -> e00344393f31
ALTER TABLE task_instance_history DROP CONSTRAINT task_instance_history_pkey;
ALTER TABLE task_instance_history ADD COLUMN id SERIAL NOT NULL;
ALTER TABLE task_instance_history DROP COLUMN task_instance_id;
ALTER TABLE task_instance_history DROP COLUMN try_id;
UPDATE alembic_version SET version_num='e00344393f31' WHERE alembic_version.version_num = '7645189f3479';
COMMIT;The relevant line is:
ALTER TABLE task_instance_history ADD COLUMN id SERIAL NOT NULL;If I now create a test table and fill it with some data:
CREATE TABLE test ( column_a text, column_b integer );
INSERT INTO test (column_a, column_b) VALUES ('some text', 3);
INSERT INTO test (column_a, column_b) VALUES ('some other text', 17);I see it as expected:
SELECT * FROM test;
column_a | column_b
-----------------+----------
some text | 3
some other text | 17
(2 rows)
If I now apply the relevant change to this table with its existing data like this:
ALTER TABLE test ADD COLUMN id SERIAL NOT NULL;I can see the values being inserted:
SELECT * FROM test;
column_a | column_b | id
-----------------+----------+----
some text | 3 | 1
some other text | 17 | 2
(2 rows)
So this seems to work as intended on PostgreSQL. I tested it on MySQL when I wrote this PR, with a similar result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you test with postgresql because according to alembic, autoincrement is only understood by mysql: https://alembic.sqlalchemy.org/en/latest/ops.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had hoped this would make it sufficiently clear that I did:
The downgrade SQL generated for psql looks like this:
[…]
So this seems to work as intended on PostgreSQL.
Technically, the generated code doesn't use AUTO_INCREMENT:
ALTER TABLE test ADD COLUMN id SERIAL NOT NULL;But as we can see in the PostgreSQL documentation, the SERIAL type is an autoincrementing integer, which is just what we need: https://www.postgresql.org/docs/current/datatype-numeric.html
Maybe Alembic translates sa.INTEGER, autoincrement=True into SERIAL for Postgres.
Also, it's not like I invented this definition, either. task_instance_history was introduced in revision d482b7261ff9 (number 21, by the new count) with the id column defined like this:
sa.Column("id", sa.Integer(), primary_key=True, autoincrement=True),So clearly, this worked before (and for all supported DBs, I might add) and I see no reason why it shouldn't work again.
And the fact remains, that the current code leaves the DB broken post-downgrade. While it restores the id column and refills it in what may be the most cumbersome way possible, it neglects the autoincrement the codebase expects at the relevant version, leading to queries failing. Please see the issue I linked in the original post for details.
closes: #57747
task_instance_history.idas the table's primary key, given that it was that before the upgrade.autoincrementinstead of filling in the values manually. This ensures future entries will get a key, as the code for those versions expects it.I verified that adding an auto-increment column (
INTEGER SERIALfor postgres,int NOT NULL AUTO_INCREMENTfor mysql) to a non-empty table backfills the new fields as we did manually.^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named
{pr_number}.significant.rstor{issue_number}.significant.rst, in airflow-core/newsfragments.