-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kubernetes operator for Celery #24
Comments
Initial attempt at implementing this (still very much a proof-of-concept): https://github.com/jmdacruz/celery-k8s-operator |
I honestly haven't been following the development of 5.X that close, where can I get a glimpse on the biggest changes? Worst case scenario, if there are breaking changes, the operator supports versioning: The |
Actually, scratch that... you had already mentioned this above :-). I'll take a look at that. |
Trying to stick to the KISS principle, I think a good option would be to keep a single CRD (the Another approach would be to treat the In general, an Kubernetes operator needs to be very simple to deploy and maintain, since it's by definition a critical piece of the infrastructure (it becomes part of the Kubernetes cluster by extending its functionality, making sure other pieces work as expected) Take a look at what the folks at Lyft do for the Apache Flink operator: https://github.com/lyft/flinkk8soperator |
Is this thread still active? Inspired by the need for this at my own work, I also tried building a POC/MVP of Celery operator to learn the whole thing - https://github.com/brainbreaker/Celery-Kubernetes-Operator I also presented it in EuroPython 2020 last Friday as part of my talk around automating the management of Kubernetes Infra while staying in the Python ecosystem(Slides - https://bit.ly/europython20-ppt). I'm willing to commit a certain number of hours every week to build a production-ready version of the Celery operator. |
In the Apache Airflow project, we use KEDA to provide autoscaling from 0 to n. |
I quite liked the approach KEDA take when I first saw that. |
Yes, KEDA is probably the best way to go for scaling use-case. It keeps us close to native solutions like HPA and only introduces the metrics server and controller. For my application, I was personally more focused around the learning experience so chose to implement a really basic scaling algorithm without using anything external. |
Let me give my inputs from what I know about running Celery in production. I'm yet to read and understand the proposal and architecture for 5.x you've shared. I'll come back with more inputs. I'm focusing on the problem of all the manual work/configuration that needs to be done while setting up Celery on K8s -
There might be more things that come up while managing the lifecycle of a Celery application(I'm not a Celery expert rn but willing to explore/learn). I guess solving these manual steps would be a good starting point. What do you suggest? |
you can start with celery 4.4.x as well. |
Let's continue the operator conversation from celery issues#4213 here itself(at one place).
I'm still going through the arch doc and thinking what all operator/controller will need to do. I definitely see some major changes from the way 4.X needed to be deployed on a K8s cluster in this CEP. I'll come back with some questions/comments by this weekend. Sorry for the delay because of my limited availability. |
Okay so I reviewed the architecture for 5, it looks really promising. I've some comments which I'll add to the PR#27. For the operator, we could support both 4.X and 5. I feel we should start with 4.4.X as per the suggestion of @auvipy. We can introduce versioning in the operator as we go along. Correct me if I'm wrong - For 5 to go on for a stable version and be adopted by the community as a breaking release - will take time. I'm guessing more than a year. Till then and even beyond that, people would still be using 4.4.X if it's too much effort to migrate and they don't wish to use the new use-cases 5 is going to support. 4.4.X Operator will be somewhat simpler to implement and good way to start because it has less moving parts/components. For controller implementation of 5 - we need to have a detailed discussion around the lifecycle of components(Data Sources, Sinks, router, execution platform, and so on). We also need to discuss what all would lie in the scope of the controller and what won't - For example - managing different message brokers, data sources and sinks might go beyond the work of the Celery operator. Ideally, I'd want to try running Celery 5 in production to see the pain points and manual things to be done before writing an operator to fix those. I think there's still some way to go for that. What do you suggest @thedrow? If you guys agree to go ahead with 4.4.X as a start, then I'll go ahead and chalk out a design document for the operator and share it with you guys as soon as I can. |
your observation seems practically logical to me. I would suggest starting with 4.x first. One Goose step at a time O:) |
I have recently started to create the celery operator with operator framework. |
@auvipy @thedrow @jmdacruz Would like to have inputs/suggestions from you guys. |
will look into this next week. |
Awesome, thanks.
Yes, for sure. I'd be happy to submit a CEP.
Sounds good. Although, I have written the document by keeping in mind Celery 4.4.X right now, not 5. But yeah, we should think around making it scalable to handle 5 as well.
Great. I'm looking forward to your inputs. |
Opened #29. It'll probably be better for you guys to review it as a CEP. I couldn't review the rendered output for RST, however, I've tried my best to avoid any random formatting issues using online tools. |
This proposal is about having a Kubernetes operator (see here and here). The scope of the operator would be the following:
CeleryApplication
. This resource would contain the configuration for the cluster (e.g., container resource requests/limits, number of replicas), Celery configuration (e.g., broker and result backend configuration), Docker image with the code and launch parameters (e.g., location of the code inside the container, virtualenv)CeleryApplication
CRD, and would spawn a KubernetesDeployment
for the cluster, and also aDeployment
for runningflower
. It would also create a KubernetesService
so that we can access theflower
UI/APICeleryApplication
resources should be able to use a shared broker and result backend, but they can pick their own broker configuration too.This idea is inspired by the Flink Kubernetes Operator developed by Lyft: https://github.com/lyft/flinkk8soperator
The text was updated successfully, but these errors were encountered: