-
Notifications
You must be signed in to change notification settings - Fork 14.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Airflow 2.3 scheduler error: 'V1Container' object has no attribute '_startup_probe' #23727
Comments
Thanks for opening your first issue here! Be sure to follow the issue template! |
There are few options where it could get wrong:
@dstandish - if the second hypothesiss is confirmed, it might actually mean that we have to implement a fix for 2.3.1 as it might prevent the migration for a number of users. |
We have no startup_probes defined and we are using the default pod template file that comes with the Helm chart - we only overwrite some options (resources limits and requests) via executor_config to match task needs |
So this is likely some serialization issue as I suspected. |
I will wait for @dstandish comment - maybe this is a "known" issue already, but for me it looks like one that should be solved in 2.3.1 as it's quite likely caused by the k8s library version migration between 2.2 and 2.3 in general. |
taking a look at this. |
@patryk126p can you provide a sample dag? i attempted to repro using your steps (ran a simple dag in 2.2.4, upgraded to 2.3.0, ran it again) and there was no issue. perhaps you are doing something with executor config? or perhaps specifying |
actually yeah does look like you must be putting something in executor config. please share the code related to this. thanks. |
@dstandish we are not using kubernetes pod operators nor specifying
Metadata/annotations are added only in case of very long running and critical tasks, to be sure that kubernetes won't evict pods in the middle of processing. |
@patryk126p please show the imports for those classes If you can establish repro steps too that would be really helpful too |
Here is the complete set of imports from all custom operators/sensors/hooks (somme of the imports are for type hints only):
|
Are you sure that the kubernetes library has been upgraded everywhere in your system ? For me it looks like you have some image that has the old version of library still installed. |
From inside container (same docker images were used for all components in helm chart):
|
Which Kubernetes version do you run (cluster version) @patryk126p ? |
@potiuk it's 1.21.9 on EKS |
@patryk126p was hoping to see how you are importing the k8s objects used here: {
"pod_override": V1Pod(
spec=V1PodSpec(
containers=[
V1Container(
name="base",
resources=V1ResourceRequirements(
limits={"cpu": "<X>", "memory": "<X>"},
requests={"cpu": "<X>", "memory": "<X>"},
),
)
]
),
# metadata=V1ObjectMeta(
# annotations={"cluster-autoscaler.kubernetes.io/safe-to-evict": "false"}
# ),
)
} though i know it's a long shot that it leads us anywhere. |
@dstandish are you asking just for import statement? If yes then here it is:
|
A related Stackoverflow question : https://stackoverflow.com/questions/67929230/airflow-2-0-1-pod-template-override-not-working-as-expected-for-kubernetesexecu |
Yeah - it does look as the deserialisation issue as I suspected. Just a thought - we do have the new I wonder @patryk126p if that is something that you could run to see if it helps. |
This command is exactly foreseen for this kind of cases where the dags were serialized using older airflow versions. I suspect what could have happened is that the old dags were serialized when there was an old kubernetes library installed, and it lacked the In this case |
Sounds like this may be it. Once we have a window when we can attempt moving to 2.3 again I will try this command. But in the meantime can such info be added to some migration docs or something? |
Any news @patryk126p :) ? |
@hterik @humbledude see #24478 for webserver fix |
When UI unpickles executor_configs with outdated k8s objects it can run into the same issue as the scheduler does (see apache#23727). Our JSON encoder therefore needs to suppress encoding errors arising from pod serialization, and fallback to a default value.
When UI unpickles executor_configs with outdated k8s objects it can run into the same issue as the scheduler does (see #23727). Our JSON encoder therefore needs to suppress encoding errors arising from pod serialization, and fallback to a default value.
ok @hterik @humbledude webserver fix merged so you are welcome to try patching your envs with those changes |
When UI unpickles executor_configs with outdated k8s objects it can run into the same issue as the scheduler does (see apache/airflow#23727). Our JSON encoder therefore needs to suppress encoding errors arising from pod serialization, and fallback to a default value.
From time to time k8s library objects change their attrs. If executor config is stored with old version, and unpickled with new version, we can get attribute errors that can crash the scheduler (see apache#23727). Here we update handling so that we fail the task but don't crash the scheduler. (cherry picked from commit 0c41f43)
@dstandish I patched with commits on #24478 and it seems work! |
great, thanks |
When UI unpickles executor_configs with outdated k8s objects it can run into the same issue as the scheduler does (see #23727). Our JSON encoder therefore needs to suppress encoding errors arising from pod serialization, and fallback to a default value.
From time to time k8s library objects change their attrs. If executor config is stored with old version, and unpickled with new version, we can get attribute errors that can crash the scheduler (see #23727). Here we update handling so that we fail the task but don't crash the scheduler. (cherry picked from commit 0c41f43)
@dstandish I'm seeing this on Airflow 2.3.3 with a vanilla web server docker image, which should have #24478 in it. We use the REST API for an integration with another system of ours, and when it calls the
Is a separate fix needed for that? |
Can you try 2.3.4 (It will be out in 2 days or so) and if it is still there please open a new issue with detailed description of the problem and how you got there. You can refer to this issue in the new one but piggybacking on existing, closed issue is not going to make it "active" I am afraid. |
When UI unpickles executor_configs with outdated k8s objects it can run into the same issue as the scheduler does (see apache/airflow#23727). Our JSON encoder therefore needs to suppress encoding errors arising from pod serialization, and fallback to a default value. GitOrigin-RevId: 4ab96bac198b27f7c558326e526be22e945e4c4d
From time to time k8s library objects change their attrs. If executor config is stored with old version, and unpickled with new version, we can get attribute errors that can crash the scheduler (see apache/airflow#23727). Here we update handling so that we fail the task but don't crash the scheduler. (cherry picked from commit 0c41f437674f135fe7232a368bf9c198b0ecd2f0) GitOrigin-RevId: b1be02473b2ad04dde8d1268a47f18a22eb89faa
When UI unpickles executor_configs with outdated k8s objects it can run into the same issue as the scheduler does (see apache/airflow#23727). Our JSON encoder therefore needs to suppress encoding errors arising from pod serialization, and fallback to a default value. GitOrigin-RevId: 4ab96bac198b27f7c558326e526be22e945e4c4d
From time to time k8s library objects change their attrs. If executor config is stored with old version, and unpickled with new version, we can get attribute errors that can crash the scheduler (see apache/airflow#23727). Here we update handling so that we fail the task but don't crash the scheduler. (cherry picked from commit 0c41f437674f135fe7232a368bf9c198b0ecd2f0) GitOrigin-RevId: b1be02473b2ad04dde8d1268a47f18a22eb89faa
in 2.5.0 #28227 |
From time to time k8s library objects change their attrs. If executor config is stored with old version, and unpickled with new version, we can get attribute errors that can crash the scheduler (see apache/airflow#23727). Here we update handling so that we fail the task but don't crash the scheduler. GitOrigin-RevId: 0c41f437674f135fe7232a368bf9c198b0ecd2f0
When UI unpickles executor_configs with outdated k8s objects it can run into the same issue as the scheduler does (see apache/airflow#23727). Our JSON encoder therefore needs to suppress encoding errors arising from pod serialization, and fallback to a default value. GitOrigin-RevId: fe5e689adfe3b2f9bcc37d3975ae1aea9b55e28a
When UI unpickles executor_configs with outdated k8s objects it can run into the same issue as the scheduler does (see apache/airflow#23727). Our JSON encoder therefore needs to suppress encoding errors arising from pod serialization, and fallback to a default value. GitOrigin-RevId: 4ab96bac198b27f7c558326e526be22e945e4c4d
From time to time k8s library objects change their attrs. If executor config is stored with old version, and unpickled with new version, we can get attribute errors that can crash the scheduler (see apache/airflow#23727). Here we update handling so that we fail the task but don't crash the scheduler. (cherry picked from commit 0c41f437674f135fe7232a368bf9c198b0ecd2f0) GitOrigin-RevId: b1be02473b2ad04dde8d1268a47f18a22eb89faa
From time to time k8s library objects change their attrs. If executor config is stored with old version, and unpickled with new version, we can get attribute errors that can crash the scheduler (see apache/airflow#23727). Here we update handling so that we fail the task but don't crash the scheduler. GitOrigin-RevId: 0c41f437674f135fe7232a368bf9c198b0ecd2f0
When UI unpickles executor_configs with outdated k8s objects it can run into the same issue as the scheduler does (see apache/airflow#23727). Our JSON encoder therefore needs to suppress encoding errors arising from pod serialization, and fallback to a default value. GitOrigin-RevId: fe5e689adfe3b2f9bcc37d3975ae1aea9b55e28a
From time to time k8s library objects change their attrs. If executor config is stored with old version, and unpickled with new version, we can get attribute errors that can crash the scheduler (see apache/airflow#23727). Here we update handling so that we fail the task but don't crash the scheduler. GitOrigin-RevId: 0c41f437674f135fe7232a368bf9c198b0ecd2f0
When UI unpickles executor_configs with outdated k8s objects it can run into the same issue as the scheduler does (see apache/airflow#23727). Our JSON encoder therefore needs to suppress encoding errors arising from pod serialization, and fallback to a default value. GitOrigin-RevId: fe5e689adfe3b2f9bcc37d3975ae1aea9b55e28a
Apache Airflow version
2.3.0 (latest released)
What happened
After migrating from Airflow 2.2.4 to 2.3.0 scheduler fell into crash loop throwing:
kubernetes python library version was exactly as specified in constraints file: https://raw.githubusercontent.com/apache/airflow/constraints-2.3.0/constraints-3.9.txt
What you think should happen instead
Scheduler should work
How to reproduce
Not 100% sure but:
Operating System
Debian GNU/Linux 11 (bullseye)
Versions of Apache Airflow Providers
irrelevant
Deployment
Official Apache Airflow Helm Chart
Deployment details
KubernetesExecutor
PostgreSQL (RDS) as Airflow DB
Python 3.9
Docker images build from
apache/airflow:2.3.0-python3.9
(some additional libraries installed)Anything else
No response
Are you willing to submit PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: