Skip to content

Conversation

@ephraimbuddy
Copy link
Contributor

It seemed simpler to reserialize the DAGs and update the task instances directly, so that's what I did here. I'm slightly unsure if this could lead to failures during reserialization or cause performance issues.

An alternative would have been to manually create entries for serialized_dag, dag_version, and dag_code before updating the TIs, but that felt more complex.

The issue here is that, upgrades from AF2 fails due to the TIs not been associated with dag_versions. The issue mainly affects users upgrading from Airflow 2, since in Airflow 3 the dag_version table is already populated for all DAGs.

It seemed simpler to reserialize the DAGs and update the
task instances directly, so that's what I did here.
I'm slightly unsure if this could lead to failures during
reserialization or cause performance issues.

An alternative would have been to manually create entries
for serialized_dag, dag_version, and dag_code before
updating the TIs, but that felt more complex.

The issue here is that, upgrades from AF2 fails due to the
TIs not been associated with dag_versions. The issue mainly
affects users upgrading from Airflow 2, since in Airflow 3
the dag_version table is already populated for all DAGs.
@dstandish
Copy link
Contributor

hmm -- we merged a change to just blow away old serialization -- shouldn't that have made this a non-issue?

@ephraimbuddy
Copy link
Contributor Author

hmm -- we merged a change to just blow away old serialization -- shouldn't that have made this a non-issue?

This one is about TI.dag_version_id being non-nullable

@phanikumv phanikumv added this to the Airflow 3.1.0 milestone Jul 30, 2025
@uranusjr
Copy link
Member

uranusjr commented Aug 6, 2025

Do we need to reserialise during migration? I feel it might be enough to just delete the serialised data. I believe Airflow should automatically reserialise missing dags once the migration finishes and the scheduler is restarted?

@ephraimbuddy
Copy link
Contributor Author

Do we need to reserialise during migration? I feel it might be enough to just delete the serialised data. I believe Airflow should automatically reserialise missing dags once the migration finishes and the scheduler is restarted?

We deleted it initially here https://github.com/apache/airflow/pull/43700/files when migrating from AF2 but realized we could loose true histories and reverted it. Deleting the serdag could have been better then but I think we are doing it late if we do it at this point as we could loose AF3+ histories.

@dstandish
Copy link
Contributor

Do we need to reserialise during migration? I feel it might be enough to just delete the serialised data. I believe Airflow should automatically reserialise missing dags once the migration finishes and the scheduler is restarted?

We deleted it initially here https://github.com/apache/airflow/pull/43700/files when migrating from AF2 but realized we could loose true histories and reverted it. Deleting the serdag could have been better then but I think we are doing it late if we do it at this point as we could loose AF3+ histories.

What do you mean we could lose true histories? In airflow 2, we don't have serdag history

@ephraimbuddy
Copy link
Contributor Author

ephraimbuddy commented Aug 11, 2025

Do we need to reserialise during migration? I feel it might be enough to just delete the serialised data. I believe Airflow should automatically reserialise missing dags once the migration finishes and the scheduler is restarted?

We deleted it initially here https://github.com/apache/airflow/pull/43700/files when migrating from AF2 but realized we could loose true histories and reverted it. Deleting the serdag could have been better then but I think we are doing it late if we do it at this point as we could loose AF3+ histories.

What do you mean we could lose true histories? In airflow 2, we don't have serdag history

It's no. So if we delete the serdag now, we would loose history. Deleting serdag when migrating from Airflow 2 would have been better than now.
I have forgotten why we had to revert the serdag deletion. Maybe @jedcunningham could remember

ephraimbuddy added a commit to astronomer/airflow that referenced this pull request Aug 11, 2025
This is alternative to apache#53820. Here we make the TI.dag_version_id
nullable on the database level. it's still enforced in code
ephraimbuddy added a commit to astronomer/airflow that referenced this pull request Aug 11, 2025
This is alternative to apache#53820. Here we make the TI.dag_version_id
nullable on the database level. it's still enforced in code
ephraimbuddy added a commit to astronomer/airflow that referenced this pull request Aug 12, 2025
This is alternative to apache#53820. Here we make the TI.dag_version_id
nullable on the database level. it's still enforced in code
ephraimbuddy added a commit that referenced this pull request Aug 12, 2025
* Partially revert #50825 on database level

This is alternative to #53820. Here we make the TI.dag_version_id
nullable on the database level. it's still enforced in code

* fixup! Partially revert #50825 on database level
@ephraimbuddy
Copy link
Contributor Author

Closing in preference to #54366

@kaxil kaxil removed this from the Airflow 3.1.0 milestone Sep 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants