Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Airflow 2.2.1 upgrade #19421

Closed
1 of 2 tasks
stijndehaes opened this issue Nov 5, 2021 · 3 comments
Closed
1 of 2 tasks

Airflow 2.2.1 upgrade #19421

stijndehaes opened this issue Nov 5, 2021 · 3 comments
Labels
area:core kind:bug This is a clearly a bug

Comments

@stijndehaes
Copy link
Contributor

Apache Airflow version

2.2.1 (latest released)

Operating System

Debian

Versions of Apache Airflow Providers

Not relevant

Deployment

Other Docker-based deployment

Deployment details

Running on a kubernetes cluster

What happened

Upgrading from airflow 2.1.4 to 2.2.1 gave the following mesasge:

Airflow found incompatible data in the task_instance table in the metadatabase, and has moved them to _airflow_moved__2_2__task_instance during the database migration to upgrade. Please inspect the moved data to decide whether you need to keep them, and manually drop the _airflow_moved__2_2__task_instance table to dismiss this warning.

What you expected to happen

It's ok that this message show, but there is no explenation to be found on what to do if you want to keep this data around. Or why this failed.

How to reproduce

Not sure

Anything else

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@stijndehaes stijndehaes added area:core kind:bug This is a clearly a bug labels Nov 5, 2021
@stijndehaes
Copy link
Contributor Author

By looking through the source code I noticed this can happen when you have task instances that have no dag run attached. This means these are orphaned task instances, in theory one could created the needed dag run but this does not look to be easily feasible. I also noticed that some of the task instances are of deleted/renamed dags so it's impossible to generate a proper dag run.

I think these orphaned task instances can get into the database by dag renaming or deleting a dag in the UI, but the dag file was still on disk and thus reparsed by the scheduler.

@potiuk
Copy link
Member

potiuk commented Nov 7, 2021

I think this one and #19440 converted into discussion #19444 indicate that the message is a bit unclear for users. I prepared the PR #194553 to improve that (adds upgrading section to our documentation and have the message link to it so that rather than asking questions in the issues, users can find context and answers what they should do in our doc.

Please take a look @stijndehaes if that would explain better the context

@stijndehaes
Copy link
Contributor Author

I think it does :) Thank you for answering this issue

potiuk added a commit to potiuk/airflow that referenced this issue Nov 10, 2021
In Airflow 2.2.2 we introduced a fix in apache#18953 where the corrupted
data was moved to a separate table. However some of our users
(rightly) might not have the context. We've never had anything
like that before, so the users who treat Airflow DB as
black-boxes might get confused on what the error means and what
they should do in this case.

You can see it in apache#19440 converted into discussion apache#19444 and apache#19421
indicate that the message is a bit unclear for users. This PR attempts to
improve that it adds `upgrading` section to our documentation and have the
message link to it so that rather than asking questions in the issues,
users can find context and answers what they should do in our docs.

It also guides the users who treat Airflow DB as "black-box" on how they
can use their tools and airflow db shell to fix the problem.
potiuk added a commit that referenced this issue Nov 10, 2021
* Improve message and documentation around moved data

In Airflow 2.2.2 we introduced a fix in #18953 where the corrupted
data was moved to a separate table. However some of our users
(rightly) might not have the context. We've never had anything
like that before, so the users who treat Airflow DB as
black-boxes might get confused on what the error means and what
they should do in this case.

You can see it in #19440 converted into discussion #19444 and #19421
indicate that the message is a bit unclear for users. This PR attempts to
improve that it adds `upgrading` section to our documentation and have the
message link to it so that rather than asking questions in the issues,
users can find context and answers what they should do in our docs.

It also guides the users who treat Airflow DB as "black-box" on how they
can use their tools and airflow db shell to fix the problem.
kaxil pushed a commit that referenced this issue Nov 11, 2021
* Improve message and documentation around moved data

In Airflow 2.2.2 we introduced a fix in #18953 where the corrupted
data was moved to a separate table. However some of our users
(rightly) might not have the context. We've never had anything
like that before, so the users who treat Airflow DB as
black-boxes might get confused on what the error means and what
they should do in this case.

You can see it in #19440 converted into discussion #19444 and #19421
indicate that the message is a bit unclear for users. This PR attempts to
improve that it adds `upgrading` section to our documentation and have the
message link to it so that rather than asking questions in the issues,
users can find context and answers what they should do in our docs.

It also guides the users who treat Airflow DB as "black-box" on how they
can use their tools and airflow db shell to fix the problem.

(cherry picked from commit de43fb3)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:core kind:bug This is a clearly a bug
Projects
None yet
Development

No branches or pull requests

2 participants