Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retry is not happening after exception #308

Closed
kopf-archiver bot opened this issue Aug 18, 2020 · 0 comments
Closed

Retry is not happening after exception #308

kopf-archiver bot opened this issue Aug 18, 2020 · 0 comments
Labels
archive bug Something isn't working

Comments

@kopf-archiver
Copy link

kopf-archiver bot commented Aug 18, 2020

An issue by gtsystem at 2020-02-10 10:58:23+00:00
Original URL: zalando-incubator/kopf#308
 

Long story short

If an exception occur inside a handler for volumeattachments objects, the handler will never get retried.

Looking at the logs, it seems that kopf try to patch the object status with {"status":{"kopf": {"processing": ...}}}
This patch call succeed but the object status is not changed. I guess this is the reason no further attempts are made.
In theory the /status endpoint should be used to patch the status, but from my testing this have no effects as well. It seems like custom fields are not accepted inside "status".

Possible fix: use an annotation to store kopf status?

Description

The code snippet to reproduce the issue

test_va.py

class MyException(Exception):
    pass

@kopf.on.create('storage.k8s.io', 'v1', 'volumeattachments', retries=3, backoff=1.0)
def eventual_failure_with_tracebacks(**kwargs):
    raise Exception("An error that is supposed to be recoverable.")

va.yaml

apiVersion: storage.k8s.io/v1
kind: VolumeAttachment
metadata:
  name: test-vaa
spec:
  attacher: xxx 
  nodeName: kind-worker
  source:
    persistentVolumeName: a-pv
status:
  attached: false
The exact command to reproduce the issue
kopf run test_va.py &
kubectl create -f va.yaml
The full output of the command that failed
[2020-02-10 10:29:19,209] kopf.reactor.activit [INFO    ] Initial authentication has been initiated.
[2020-02-10 10:29:19,232] kopf.activities.auth [INFO    ] Handler 'login_via_pykube' succeeded.
[2020-02-10 10:29:19,245] kopf.activities.auth [INFO    ] Handler 'login_via_client' succeeded.
[2020-02-10 10:29:19,245] kopf.reactor.activit [INFO    ] Initial authentication has finished.
[2020-02-10 10:29:19,276] kopf.engines.peering [WARNING ] Default peering object not found, falling back to the standalone mode.
[2020-02-10 10:29:45,516] kopf.objects         [ERROR   ] [None/test-vaa] Handler 'eventual_failure_with_tracebacks' failed with an exception. Will retry.
Traceback (most recent call last):
  File "/src/_envl/lib64/python3.7/site-packages/kopf/reactor/handling.py", line 529, in _execute_handler
    lifecycle=lifecycle,  # just a default for the sub-handlers, not used directly.
  File "/src/_envl/lib64/python3.7/site-packages/kopf/reactor/handling.py", line 618, in _call_handler
    **kwargs,
  File "/src/_envl/lib64/python3.7/site-packages/kopf/reactor/invocation.py", line 130, in invoke
    result = await loop.run_in_executor(config.WorkersConfig.get_syn_executor(), real_fn)
  File "/usr/lib64/python3.7/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "test_va.py", line 9, in eventual_failure_with_tracebacks
    raise Exception("An error that is supposed to be recoverable.")
Exception: An error that is supposed to be recoverable.

Environment

  • Kopf version: 0.25
  • Kubernetes version: 1.16.3
  • Python version: 3.7.3
  • OS/platform: linux

Commented by nolar at 2020-02-13 18:17:44+00:00
 

Hello. Thanks for reporting.

Yes, indeed, Kopf relies on patching the object to cause a cascaded reaction with the next steps (maybe not the wisest way, but it was done so originally, and I had no reasons to redesign that yet).

As far as I know, /status is slightly different now when it is declared as a subresource of a CRD.

Kopf does not handle this case at the moment, and only assumes that status is part of the object — which can be controlled with custom resources, but not with built-in resources.

I think, it would be easy to change the function to patch either an object or its /status based on the resource definition. Since recently, we fetch the resource's meta-information anyway — for the cluster-scoped/namespaced detection. It can be extended to identify when /status is a sub-resource too.


Commented by nolar at 2020-02-13 18:18:46+00:00
 

A note for self on how it is defined in a GKE cluster:

curl -i -k -H "Authorization: Bearer $token" https://$ip/apis/storage.k8s.io/v1

{
  "kind": "APIResourceList",
  "apiVersion": "v1",
  "groupVersion": "storage.k8s.io/v1",
  "resources": [
    {
      "name": "storageclasses",
      "singularName": "",
      "namespaced": false,
      "kind": "StorageClass",
      "verbs": [
        "create",
        "delete",
        "deletecollection",
        "get",
        "list",
        "patch",
        "update",
        "watch"
      ],
      "shortNames": [
        "sc"
      ]
    },
    {
      "name": "volumeattachments",
      "singularName": "",
      "namespaced": false,
      "kind": "VolumeAttachment",
      "verbs": [
        "create",
        "delete",
        "deletecollection",
        "get",
        "list",
        "patch",
        "update",
        "watch"
      ]
    },
    {
      "name": "volumeattachments/status",
      "singularName": "",
      "namespaced": false,
      "kind": "VolumeAttachment",
      "verbs": [
        "get",
        "patch",
        "update"
      ]
    }
  ]
}

So, volumeattachments/status is obviously there, separated from volumeattachments.

The same happens with CRDs when status is declared as a sub-resource:

$ curl -i -k -H "Authorization: Bearer $token" https://$ip/apis/zalando.org/v1

{
  "kind": "APIResourceList",
  "apiVersion": "v1",
  "groupVersion": "zalando.org/v1",
  "resources": [
    {
      "name": "kopfexamples222",
      "singularName": "kopfexample222",
      "namespaced": true,
      "kind": "KopfExample222",
      "verbs": [
        "delete",
        "deletecollection",
        "get",
        "list",
        "patch",
        "create",
        "update",
        "watch"
      ],
      "shortNames": [
        "kopfexes222",
        "kopfex222",
        "kexes222",
        "kex222"
      ]
    },
    {
      "name": "kopfexamples222/status",
      "singularName": "",
      "namespaced": true,
      "kind": "KopfExample222",
      "verbs": [
        "get",
        "patch",
        "update"
      ]
    },
    {
      "name": "kopfexamples",
      "singularName": "kopfexample",
      "namespaced": true,
      "kind": "KopfExample",
      "verbs": [
        "delete",
        "deletecollection",
        "get",
        "list",
        "patch",
        "create",
        "update",
        "watch"
      ],
      "shortNames": [
        "kopfexes",
        "kopfex",
        "kexes",
        "kex"
      ]
    }
  ]

A little experiment shows that it indeed reacts only when /status is patched, and ignores the changes when the object itself is patched with the status: field.

$ kubectl get kex222 -w
$ curl -i -k -X PATCH -H "Authorization: Bearer $token" https://$ip/apis/zalando.org/v1/namespaces/default/kopfexamples222/kopf-example-1 -d '{"status": {"create_fn": {"message":400}}}' -H "Content-Type: application/merge-patch+json"
$ curl -i -k -X PATCH -H "Authorization: Bearer $token" https://$ip/apis/zalando.org/v1/namespaces/default/kopfexamples222/kopf-example-1/status -d '{"status": {"create_fn": {"message":300}}}' -H "Content-Type: application/merge-patch+json"

Commented by nolar at 2020-02-13 20:04:22+00:00
 

Related: #119. Should be fixed with #313.


Commented by gtsystem at 2020-02-14 08:49:09+00:00
 

As I mention, it seems that added custom fields in /status are just ignored:

curl -i -k -X PATCH -H "Content-Type: application/merge-patch+json" -d '{"status": {"create_fn":{"message":400}}}' "http://127.0.0.1:8001/apis/storage.k8s.io/v1/volumeattachments/myva/status"
HTTP/1.1 200 OK
...

{
  "kind": "VolumeAttachment",
  "apiVersion": "storage.k8s.io/v1",
  "metadata": {
    "name": "myva",
    ...
    }
  },
  "spec": {
   ...
  },
  "status": {
    "attached": true
  }

Commented by nolar at 2020-02-14 09:14:00+00:00
 

Ouch. That is highly unfortunate. Sorry, I missed that statement.

I couldn't quickly figure out how to create VolumeAttachments in GKE even with a pod+pvc pair — so I could not test it on this resource specifically (yesterday evening). Well, at least, it now works with other resources with status-as-a-subresource (#119).

What I can suggest, is a per-resource-kind or per-operator config where the internal status is stored: with .status.kopf by default, and optionally configurable to .kopf or .blahblah. — Can you please try if it works with VolumeAttachments? Or are they so strict that even arbitrary fields are not allowed?

Actually, there is a similar task #23 (and #283) — but there, the main use-case is multiple operators handling the same resources, so that they do not collide with each other. It can be extended to arbitrary root field name too.

Storing it in an arbitrary top-level field would be the easiest way. Otherwise, annotations can be used indeed; but labels & annotations are allowed to be strings only, so serialisation/deserialisation would be needed.


Commented by gtsystem at 2020-02-14 09:36:45+00:00
 

I can reproduce in other built in resources like pod or pvc. I jsut tried patching a top-level field, again no luck. Maybe new versions of kubernetes are getting more strict..


Commented by nolar at 2020-02-14 17:30:05+00:00
 

Indeed. I now also have this reproduced with pods in Minikube with 1.16. Neither status nor root accepts arbitrary fields. The only place it accepts arbitrary content is annotations.

So far, so good — let's abuse annotations!

I have made a little dirty-hacking draft to store the operator's state locally in two formats: per-handler and per-handler-field. After the handling cycle, these annotations are wiped (as previously with the status fields).

Per handler:

apiVersion: zalando.org/v1
kind: KopfExample
metadata:
  annotations:
    my-op.zalando.org/create_fn_1: '{"started": "2020-02-14T16:58:25.396364", "stopped":
      "2020-02-14T16:58:25.401844", "delayed": null, "retries": 1, "success": true,
      "failure": false, "message": null}'
    my-op.zalando.org/create_fn_2: '{"started": "2020-02-14T16:58:25.396421", "stopped":
      null, "delayed": null, "retries": 0, "success": false, "failure": false, "message":
      null}'
  creationTimestamp: "2020-02-14T16:58:25Z"
  finalizers:
  - kopf.zalando.org/KopfFinalizerMarker
  generation: 1

Per handler-field:

apiVersion: zalando.org/v1
kind: KopfExample
metadata:
  annotations:
    my-op.zalando.org/create_fn_1.delayed: "null"
    my-op.zalando.org/create_fn_1.failure: "false"
    my-op.zalando.org/create_fn_1.message: "null"
    my-op.zalando.org/create_fn_1.retries: "1"
    my-op.zalando.org/create_fn_1.started: '"2020-02-14T17:10:42.713227"'
    my-op.zalando.org/create_fn_1.stopped: '"2020-02-14T17:10:42.718587"'
    my-op.zalando.org/create_fn_1.success: "true"
    my-op.zalando.org/create_fn_2.delayed: "null"
    my-op.zalando.org/create_fn_2.failure: "false"
    my-op.zalando.org/create_fn_2.message: "null"
    my-op.zalando.org/create_fn_2.retries: "0"
    my-op.zalando.org/create_fn_2.started: '"2020-02-14T17:10:42.713403"'
    my-op.zalando.org/create_fn_2.stopped: "null"
    my-op.zalando.org/create_fn_2.success: "false"
  creationTimestamp: "2020-02-14T17:10:42Z"
  finalizers:
  - kopf.zalando.org/KopfFinalizerMarker
  generation: 1

Per handler-field, if serialised a little bit smarter:

apiVersion: zalando.org/v1
kind: KopfExample
metadata:
  annotations:
    my-op.zalando.org/create_fn_1.retries: "1"
    my-op.zalando.org/create_fn_1.started: "2020-02-14T17:10:42.713227"
    my-op.zalando.org/create_fn_1.stopped: "2020-02-14T17:10:42.718587"
    my-op.zalando.org/create_fn_1.success: "true"
    my-op.zalando.org/create_fn_2.retries: "0"
    my-op.zalando.org/create_fn_2.started: "2020-02-14T17:10:42.713403"
  creationTimestamp: "2020-02-14T17:10:42Z"
  finalizers:
  - kopf.zalando.org/KopfFinalizerMarker
  generation: 1

Works like a charm! — Both for CRs and pods.

In both cases, and as per the docs, the generation is not increased when metadata is changed (this happens only for spec). But it generates the watch-events — which makes sense since most of the printed columns of kubectl get -w are taken from status.

Beside this, such approach allows to customise the operator name: instead of hardcoded kopf.zalando.org, developers can use my-op.companyname.com in annotations.

The first format is compact. The second format is quite verbose, but it allows the individual fields to be put as printable columns, or to be filtered by them.

Another approach would be to put the whole state as one annotation, but it is complicated by merging of the existing state with its updates (unlike REST merge-patching, where different fields are merged inside of a dict server-side, one single field is always fully overwritten and requires client-side merging).

I will try to do some magic on that to make it nice and simple. But this can be considered as a general direction for built-in resources in the future. For custom resources, I prefer to keep status.kopf by default instead of annotations — but let the developers to configure that.


Commented by gtsystem at 2020-02-14 21:53:29+00:00
 

One annotation per handler looks already good to me 👍


Commented by gtsystem at 2020-03-24 14:39:09+00:00
 

Any update?


Commented by nolar at 2020-03-24 18:08:54+00:00
 

Sorry, I was a bit busy with this coronavirus aftermath at work recently (well, not "a bit", actually). Nevertheless, I finally have some spare time now, and some of the framework's tasks are shifted from my private time to my work time — so, let's see how much faster it will go now and in the next few days.

PS: For custom resources, there is a workaround described in #321 (with x-kubernetes-preserve-unknown-fields: true). It does not work for built-in resources, of course (there is no control over the schemas).


Commented by gtsystem at 2020-05-15 16:47:52+00:00
 

Tested kopf==0.27rc5 (that includes #331) and now if works as expected. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
archive bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant