Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Airflow TaskInstance endpoing API error for old task instances : 'V1Container' object has no attribute '_startup_probe' #27084

Closed
2 tasks done
tirkarthi opened this issue Oct 17, 2022 · 7 comments
Assignees
Labels
affected_version:main_branch Issues Reported for main branch area:core kind:bug This is a clearly a bug

Comments

@tirkarthi
Copy link
Contributor

Apache Airflow version

main (development)

What happened

Opening issue as per comment : #23727 (comment) . We have also noticed this issue where we have a 2.1.x setup using Kubernetes executor and on upgrade the new task instances created post upgrade are working fine in task instance endpoint. The old objects fail with similar traceback as below. We also tried main branch (2.4.1 as of writing) and it also has same issue. It seems a fix similar to #24117 has to be made with a custom String field for executor_config in task instance schema that calls the _serialize method and on error returns empty dict as string. We have a fix internally though a test case might not be possible since it needs an older value of executor_config that we can't export due to private data. I will be happy to make a PR with the fix though and opened this issue for discussion,

   File "/home/airflow/.local/lib/python3.9/site-packages/airflow/api_connexion/endpoints/task_instance_endpoint.py", line 412, in get_task_instances_batch
     return task_instance_collection_schema.dump(
   File "/home/airflow/.local/lib/python3.9/site-packages/marshmallow/schema.py", line 557, in dump
     result = self._serialize(processed_obj, many=many)
   File "/home/airflow/.local/lib/python3.9/site-packages/marshmallow/schema.py", line 525, in _serialize
     value = field_obj.serialize(attr_name, obj, accessor=self.get_attribute)
   File "/home/airflow/.local/lib/python3.9/site-packages/marshmallow/fields.py", line 342, in serialize
     return self._serialize(value, attr, obj, **kwargs)
   File "/home/airflow/.local/lib/python3.9/site-packages/marshmallow/fields.py", line 774, in _serialize
     return [self.inner._serialize(each, attr, obj, **kwargs) for each in value]
   File "/home/airflow/.local/lib/python3.9/site-packages/marshmallow/fields.py", line 774, in <listcomp>
     return [self.inner._serialize(each, attr, obj, **kwargs) for each in value]
   File "/home/airflow/.local/lib/python3.9/site-packages/marshmallow/fields.py", line 643, in _serialize
     return schema.dump(nested_obj, many=many)
   File "/home/airflow/.local/lib/python3.9/site-packages/marshmallow/schema.py", line 557, in dump
     result = self._serialize(processed_obj, many=many)
   File "/home/airflow/.local/lib/python3.9/site-packages/marshmallow/schema.py", line 525, in _serialize
     value = field_obj.serialize(attr_name, obj, accessor=self.get_attribute)
   File "/home/airflow/.local/lib/python3.9/site-packages/marshmallow/fields.py", line 342, in serialize
     return self._serialize(value, attr, obj, **kwargs)
   File "/home/airflow/.local/lib/python3.9/site-packages/marshmallow/fields.py", line 893, in _serialize
     return utils.ensure_text_type(value)
   File "/home/airflow/.local/lib/python3.9/site-packages/marshmallow/utils.py", line 212, in ensure_text_type
     return str(val)
   File "/home/airflow/.local/lib/python3.9/site-packages/kubernetes/client/models/v1_pod.py", line 214, in __repr__
     return self.to_str()
   File "/home/airflow/.local/lib/python3.9/site-packages/kubernetes/client/models/v1_pod.py", line 210, in to_str
     return pprint.pformat(self.to_dict())
   File "/home/airflow/.local/lib/python3.9/site-packages/kubernetes/client/models/v1_pod.py", line 196, in to_dict
     result[attr] = value.to_dict()
   File "/home/airflow/.local/lib/python3.9/site-packages/kubernetes/client/models/v1_pod_spec.py", line 1058, in to_dict
     result[attr] = list(map(
   File "/home/airflow/.local/lib/python3.9/site-packages/kubernetes/client/models/v1_pod_spec.py", line 1059, in <lambda>
     lambda x: x.to_dict() if hasattr(x, "to_dict") else x,
   File "/home/airflow/.local/lib/python3.9/site-packages/kubernetes/client/models/v1_container.py", line 660, in to_dict
     value = getattr(self, attr)
   File "/home/airflow/.local/lib/python3.9/site-packages/kubernetes/client/models/v1_container.py", line 458, in startup_probe
     return self._startup_probe
 AttributeError: 'V1Container' object has no attribute '_startup_probe'

cc: @dstandish @joshzana

What you think should happen instead

Older task instances that cannot be serialized should possibly return empty dict instead of failing completely.

How to reproduce

  1. Upgrade to main branch from an older version like 2.1.x with task instances and executor_config using Kubernetes as executor.
  2. Hit task instance endpoint of old dagrun to fetch task instances with executor_config serialized. The traceback is thrown.

Operating System

Redhat

Versions of Apache Airflow Providers

No response

Deployment

Other Docker-based deployment

Deployment details

No response

Anything else

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@tirkarthi tirkarthi added area:core kind:bug This is a clearly a bug labels Oct 17, 2022
@dstandish
Copy link
Contributor

In 2.4.1 you probably still get that error because your raw value has already been pickled with the old k8s library and with the new version of the libary you can't unpickle it so there's nothnig you can do at that point.

What #24117 does though is fix it on a go forward basis (by serializing to json) so we shouldn't have issues.

Note that we followed up that one with #26191, which fixed an issue we didn't catch in testing where the webserver would bork the executor config by repeatedly applying the serialization logic.

I don't think there's anything to be done about configs that are already pickled with the old library in this way, but if you have something to contribute feel free to open a PR. There's always a way to test.

In any case this appears to be a duplicate of #23727.

@tirkarthi
Copy link
Contributor Author

Thanks @dstandish, agree with your point that the pickled value stored in database is incompatible. Some of our users might use ~ to fetch all task instances of all dagruns where they hit this with one faulty task instance causing server error for all other compatible task instance objects in the API. They don't really use executor_config and hence we did a patch similar to linked PR where we added an except block to log the exception and return an empty dict. I will open a PR for discussion but will be okay if the PR is declined too.

@dstandish
Copy link
Contributor

Oh I see so your concern is specifically with the API. I didn't register that initially. Yeah I'm sure there could be a fix for the scenario you mention.

@dstandish
Copy link
Contributor

Do mention this issue in the PR and tag me

@potiuk
Copy link
Member

potiuk commented Oct 24, 2022

@tirkarthi - I assigned you to it :). If you think you can make PR - cool, if you think it's not worth - let us know and we will close this one (you can also close it yourself).

@dstandish
Copy link
Contributor

I believe this is resolved by #28454

@potiuk
Copy link
Member

potiuk commented Jan 7, 2023

Closing it provisionally then as fixed (we can always re-open if it is not).

@potiuk potiuk closed this as completed Jan 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affected_version:main_branch Issues Reported for main branch area:core kind:bug This is a clearly a bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants