Skip to content
This repository has been archived by the owner on Sep 14, 2020. It is now read-only.

does kopf support loop-until? #94

Closed
trondhindenes opened this issue Jun 1, 2019 · 5 comments
Closed

does kopf support loop-until? #94

trondhindenes opened this issue Jun 1, 2019 · 5 comments
Labels
question Further information is requested

Comments

@trondhindenes
Copy link
Contributor

trondhindenes commented Jun 1, 2019

This isn't a bug.

I'm experimenting with moving some code over from metacontroller (https://metacontroller.app/) to kopf. Since metacontroller is webhook-based, it supports looping (e.g firing request on an interval to the webhook) until some attribute of the created/modified crd object is true (resyncAfterSeconds). This has worked really well, as it allows fairly stateless logic even for things that converge slowly (our use case is using CRDs to construct and invoke custom cloudformation stacks).

I'm trying to replicate the same behavior in kopf, but as far as I can see, there is no "loop-event" implemented.

I guess I'm looking for some guidance around:

  • Is there any sort of "event loop-like" structure in kopf that I can use? It would be awesome to avoid having to implement a scheduler like Celery etc
  • What is the recommended way of implementing crd handlers that have to wait for some external "thing" to converge (slowly)?
  • Is it considered "okay" to run blocking code in the kopf.on.create/kopf.on.modify event methods? I'm suspecting this would be problematic as handling multiple events would be bogged down (or does kopf handle this using multithreading or similar?)

in any case, thanks for a great project, kopf looks really promising.

@trondhindenes trondhindenes changed the title does kops support loop-until? does kopf support loop-until? Jun 1, 2019
@nolar nolar added the question Further information is requested label Jun 2, 2019
@nolar
Copy link
Contributor

nolar commented Jun 2, 2019

@trondhindenes Let me answer the questions one by one:

Regarding the blocking code in the handlers:

If the handlers are regular sync-functions (def fn()), then it is okay to run the blocking code. Such functions are always executed in the asyncio executors (read as: thread pools).

If the handlers are async functions (async def fn()), then they are executed in the main event loop of Kopf, and the blocking operations will block the whole operator. It is the developer's responsibility to make async handlers cooperative. This is documented here: https://kopf.readthedocs.io/en/stable/async/


In either case, Kopf multiplexes the events on the resource's level: i.e. multiple resources (different uids) will be handled in parallel, but all events of one individual resource (same uid) will be handled sequentially.

If a handler blocks the execution with some long operation/query, be that sync or async, that individual resource will be blocked until the handler exits — i.e. all its events will be queued and waiting for the running handler to finish.

(PS: also, only the last event in the queue will be handled, as per the K8s's evential consistency principle, see #42+#43).

@nolar
Copy link
Contributor

nolar commented Jun 2, 2019

@trondhindenes

Regarding the loop-event.

I'm not sure what exactly you want to achieve, but in our projects we use this trick feature to wait for some long-running state:

import kopf

@kopf.on.create(..., timeout=8*60*60)
def create_fn(**kwargs):
    if never_will_be_ready():
        raise kopf.HandlerFatalError("Never will be ready.")
    if not is_ready():
        raise kopf.HandlerRetryError("Not ready yet.", delay=10*60)
    doit()

Any arbitrary exception in the handlers causes a retry — until the handler succeeds. This is not guaranteed for arbitrary exceptions (but it is so by default now). But for HandlerRetryError, this is the guaranteed behaviour.

This special exception will postpone the handler for 10 mins (default with no delay specified: 1 min). If it tries for 8 hours, then it will fail by timeout.

Additionally, if you know that the state will no be reached ever, you can raise a "fatal" error — this will stop any retries too. (Technically, the timeout is implemented via this fatal error too.)

Otherwise, with 0..10-min precision, it will reach the "is_ready" state, and will "do it".

You can also use retry & started & runtime kwargs to know when the handler has started (for the 1st time), and how many retries were there (0 for the 1st time).

The handler itself has a near-zero duration. However, with this approach, the wait can span over hours or days, and survive the operator restarts (as the next operator will use the stored handler progress on the resource).

The state of the handlers will be stored temporarily in status.kopf.progress of the resource (seen by kubectl get -o yaml -f obj.yaml). It is deleted after the handling cycle finishes.

PS: Also see #16 for less annoying logs in case of these special retry/fatal exceptions.

@nolar
Copy link
Contributor

nolar commented Jun 2, 2019

@trondhindenes And thanks for your feedback! Let me know if the answers did help. Or ask more if they are not clear.

@trondhindenes
Copy link
Contributor Author

awesome, this helps a lot!! Thanks!

@nolar
Copy link
Contributor

nolar commented Jun 12, 2019

I suppose the question is answered. If not, feel free to reopen.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants