-
Notifications
You must be signed in to change notification settings - Fork 14.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Migrate from 2.1.4 to 2.2.0 #18894
Comments
Thanks for opening your first issue here! Be sure to follow the issue template! |
Did you run |
Yes, I did. Got this:
I have just killed my docker-compose with 2.1.4, changed image version to 2.2.0 and airflow never started again :) |
Did you do the suggested fix ?
|
Not yet, but I'm upgrading each new release from version 2.0.2 and this is first time, I'm getting migration problem. So I thought it's bigger problem. Especially I want to upgrade version in my production environment and seams it will be problematic. |
Well, I think you should folllow what the suggestion is. You seem to have some wrong entries in your DB (which migh be result of bugs or some very old versions of airflow or both). Closing as invalid unless following the migration suggestion does not fix the problem (likely you will have to do similar fixes in prooduction BTW). |
After deleting wrong rows airflow is working. |
Hi @sbialkowski-pixel do you have the query to delete those rows in the postgres console ? I'm also facing this issue. |
I've deleted this query, sorry. Lines 766 to 774 in bc19ae7
And I've recreate this in SQL. |
Airflow does not delete user-generated data because we don't know if they are important to the user. It would be a disaster if we automatically delete the offending records if the user delibrately added them and did not get a chance to move them somewhere. For the same reason, please inspect and fully understand any SQL queries suggested in this issue before running them, to make sure you are actually fine with those rows being deleted. |
Problem is that there is no clue, which rows need to be deleted. Also there is no straight logic behind "corresponding rows" for me. This part of code for me is the only source of some guidelines. If there is better way to deal with this kind of problem in future, I will appreciate to hear it :) |
OK. But if failed task has no more reference to dag_run, there is no point to keep it in database... |
What's the reason for those rows to appear in the DB ? How did they get there? Do we know it? I think there are two possible scenarios:
If that's the case, I agree it should be handled better - Airflow cleanig them or adding "fake" run_ids for those during migration is what I would expect as we should handle this as "regular" migration scenario.
If this is the case then the BEST we could do is to spit-out the exact SQL query to run to delete those rows. We should not delete them automaticaly. |
Seems other people have similar issue #18912 so for me it looks highly unlikely this was manually added. I think the number of reports we have in such short time (I saw 3 reports already) indicate that those rows can appear frequently as result of normal operations by Airflow, and many users might hve similar issues soon. If this is realy result of "regular" airflow behaviour, for me it calls for a very quick 2.2.1 with improved migration to handle that case (cc: @kaxil @jedcunningham) |
Case 1. Yes, I'm triggering dags manually from UI or by RestApi. |
Thanks for confirming @sbialkowski-pixel! @kaxil @jedcunningham I think we need to seriously consider 2.2.1 |
SQL we used from our deployment playbook to clean up these tables Obviously a disclaimer on running this blindly. Validate its rows you can safely remove first by just executing the CTEs BEGIN;
-- Remove dag runs without a valid run_id
DELETE FROM dag_run WHERE run_id is NULL;
-- Remove task fails without a run_id
WITH task_fails_to_remove AS (
SELECT
task_fail.dag_id,
task_fail.task_id,
task_fail.execution_date
FROM
task_fail
LEFT JOIN
dag_run ON
dag_run.dag_id = task_fail.dag_id
AND dag_run.execution_date = task_fail.execution_date
WHERE
dag_run.run_id IS NULL
)
DELETE FROM
task_fail
USING
task_fails_to_remove
WHERE (
task_fail.dag_id = task_fails_to_remove.dag_id
AND task_fail.task_id = task_fails_to_remove.task_id
AND task_fail.execution_date = task_fails_to_remove.execution_date
);
-- Remove task instances without a run_id
WITH task_instances_to_remove AS (
SELECT
task_instance.dag_id,
task_instance.task_id,
task_instance.execution_date
FROM
task_instance
LEFT JOIN
dag_run
ON dag_run.dag_id = task_instance.dag_id
AND dag_run.execution_date = task_instance.execution_date
WHERE
dag_run.run_id is NULL
)
DELETE FROM
task_instance
USING
task_instances_to_remove
WHERE (
task_instance.dag_id = task_instances_to_remove.dag_id
AND task_instance.task_id = task_instances_to_remove.task_id
AND task_instance.execution_date = task_instances_to_remove.execution_date
);
COMMIT; |
Might also want to remove DAG runs with NULL execution date, I think 2.2 also started enforcing those (and I think their existence might mess up some clauses in the SQL script; not sure). I still feel we shouldn't blindly deleting records, but some write up in documentation and better error messaging should be added to aid people through the cleanup. |
How old your databases were @RenGeng @sbialkowski-pixel ? Did you upgrade to Airflow 2 from 1.10 before ? |
I've started from 2.1.2 |
We've started using airflow one year ago with 1.10 and then updated to every new version when there is a release |
@sbialkowski-pixel @RenGeng, thanks! Would either of you happen to have an example |
just ran this in my prod db SELECT task_instance.execution_date,
task_instance.task_id,
task_instance.dag_id
FROM task_instance
LEFT JOIN dag_run ON task_instance.dag_id = dag_run.dag_id
AND task_instance.execution_date = dag_run.execution_date
WHERE dag_run.run_id IS NULL; my
then i just ran DELETE
FROM task_instance ti USING
(SELECT task_instance.execution_date,
task_instance.task_id,
task_instance.dag_id
FROM task_instance
LEFT JOIN dag_run ON task_instance.dag_id = dag_run.dag_id
AND task_instance.execution_date = dag_run.execution_date
WHERE dag_run.run_id IS NULL) trash
WHERE ti.dag_id = trash.dag_id
AND ti.task_id = trash.task_id
AND ti.execution_date = trash.execution_date; no problems in migration |
Also I'd be interested if you have ever deleted any DAG runs from the web UI (or if you can identify any of the |
slack doesn't keep historical records, i'm transferring my comments here:
my airflow journey starts with 1.10.4 1.10.4 -> 1.10.5 -> 1.10.7 -> 1.10.9 -> 1.10.10 -> 1.10.11 -> 1.10.12 -> 1.10.13 -> 1.10.14 -> 2.0.0 -> 2.0.1 -> 2.0.2 -> 2.1.0 -> 2.1.1 -> 2.1.2 -> 2.1.3 -> 2.1.4 -> 2.2.0 |
This should read »dag_run.execution_date« |
This becomes a bit neater with `... JOIN ... using (dag_id, execution_date)`
…
Am 13.10.2021 in 18:06, konfusator ***@***.***> schrieb:
>
>
> …
> LEFT JOIN
> dag_run ON
> dag_run.dag_id = task_fail.dag_id
> AND dr.execution_date = task_fail.execution_date
> …
>
>
This should read »dag_run.execution_date«
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or unsubscribe.
Triage notifications on the go with GitHub Mobile for iOS or Android.
|
#15986 might indeed have something to do with it. I just deleted a few DAG runs on an Airflow instance which is still running 2.1.0 and can confirm that the task instances from those runs remain in the DB and show up in the query used to identify dangling rows: All rows showing up in the query for this instance are related to DAGs where I have deleted DAG runs in the past, including scheduled and manual runs. |
⬆️ We are seeing this while trying to upgrade a (new) Airflow 2.1.4 project. |
This impacted me. The one thing that stood out was that most of the execution dates were from a period I definitely remember deleting. There was a DAG which was downloading data incrementally every 30 minutes (always download data > previous max ID) and I needed to clear a month of the data so it could start downloading again from like August onwards. I deleted these runs using the UI, so perhaps that is why? |
For anyone experiencing this when testing standlone and using SQLite, delete these ids:
|
Thank you for this @leonsmith ! Made our team's upgrade from v2.1.4 to v2.2.3 very clean and simple! 🚀 🚀 🚀 🚀 |
Apache Airflow version
2.2.0
Operating System
Linux
Versions of Apache Airflow Providers
default.
Deployment
Docker-Compose
Deployment details
Using airflow-2.2.0python3.7
What happened
Upgrading image from apache/airflow:2.1.4-python3.7
to apache/airflow:2.2.0-python3.7
Cause this inside scheduler, which is not starting:
What you expected to happen
Automatic database migration and properly working scheduler.
How to reproduce
Ugrade from 2.1.4 to 2.2.0 with some dags history.
Anything else
No response
Are you willing to submit PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: