-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OEP-45 :: ADR-002: Deploying Open edX on Kubernetes Using Helm #372
Conversation
Thanks for the pull request, @bradenmacdonald! Please note that it may take us up to several weeks or months to complete a review and merge your PR. Feel free to add as much of the following information to the ticket as you can:
All technical communication about the code itself will be done via the GitHub pull request interface. As a reminder, our process documentation is here. Please let us know once your PR is ready for our review and all tests are green. |
@bradenmacdonald Thank you for the contribution, I will be waiting for green light from you. |
Charts are build with the Go language templating engine which is a drawback since it means adding a new language to the ecosystem with a new learning curve. This is considered acceptable due to the large adoption of Helm in the devops community. | ||
|
||
|
||
**Other** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bradenmacdonald do you have anything else in mind for this section?
Building more Kubernetes support into Tutor | ||
=========================================== | ||
|
||
Enhancing the current support for tutor core was not considered since it would go against one of the goals of the project as `explained by the author`_. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@regisb perhaps you would like to look at this. Am I correctly characterizing your position here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is correct. I gave some additional thoughts in this other comment: overhangio/tutor#677 (comment)
I still think it would be overkill to add so many (32!) new configuration settings to Tutor core to add features such as auto-scaling to Open edX. But I believe that these options would work great in a Tutor plugin.
My position has always been that, when possible, new non-essential/complex features should be added via Tutor plugins, and I encourage you to reconsider this option. By moving to Helm, you are cutting yourself from the Tutor plugin ecosystem. You will have to re-implement all of the Tutor templates, configuration settings and CLI. Not just for Tutor core, but for other plugins as well (ecommerce and discovery, among many others).
Together, we managed to implement an extension mechanism that allows anyone to modify the kustomize manifests and to easily implement the features you need (see this PR overhangio/tutor#686). I believe this is a great way to extend the capabilities of Open edX in Kubernetes and I do not quite understand why you don't use it.
It's quite possible that the current implementation of the Tutor plugin API makes it difficult to run Helm. For instance, I think that the Jinja2/Helm templating languages might be incompatible. But I would be more than happy to improve the plugin API to make it play well with Helm.
At the very least, I think that this OEP should include a detailed section of what cannot be achieved using Tutor plugins. It should also explain how you plan on maintaining compatibility with existing plugins.
I'm pretty sure that we could work around the difficulties without cutting ourselves from the rest of the plugin ecosystem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the extended context. The approach you have taken with tutor has been very strict and I think you would not have the ease of install that you have so far without it. This OEP however is aimed more towards repeatability with installations that can not do without a good deal of tunning. This means instances with very large throughput, complex ingress for multi-tenant hosting and other restrictions on the kubernetes side like self hosted clusters that may not have backing of cloud services for the k8s api.
My position has always been that, when possible, new non-essential/complex features should be added via Tutor plugins, and I encourage you to reconsider this option. By moving to Helm, you are cutting yourself from the Tutor plugin ecosystem. You will have to re-implement all of the Tutor templates, configuration settings and CLI. Not just for Tutor core, but for other plugins as well (ecommerce and discovery, among many others).
It is not our goal to remove ourselves from the tutor ecosystem. We are active maintainers of plugins like tutor-xqueue and contrib-codejail and tools like TVM. We are using tutor for all our development and plan to continue doing so.
At the same time, for operations, we want fine grained control over every variable and we also want composability with every Helm subchart. Some of our goals (like multi tenant clusters) are specifically in what you decided to cut from the core as design principles of tutor core. Having Helm charts is a way for us to reach some of those goals while at the same time producing something some members of the community can also benefit from.
Together, we managed to implement an extension mechanism that allows anyone to modify the kustomize manifests and to easily implement the features you need (see this PR overhangio/tutor#686). I believe this is a great way to extend the capabilities of Open edX in Kubernetes and I do not quite understand why you don't use it.
Using layers to override parts of the manifest files was a part of what we did in the drydock plugin and we still had issues with composition of plugins and plugins overriding each other. The k8s-override patches are great and we will be using them in our plugins to better maintain the k8s capabilities.
At the very least, I think that this OEP should include a detailed section of what cannot be achieved using Tutor plugins. It should also explain how you plan on maintaining compatibility with existing plugins.
Given the flexibility that tutor allows into the plugin system, I do not think there is anything that cannot be achieved using tutor plugins.
Still as an operator of arrays of instances I see a lot of value in producing helm charts to manage them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you manage to integrate your Helm charts with existing Tutor plugins, then I think that it would be a net gain. I'm looking forward to seeing the implementation. It is my opinion that plugin compatibility should be added to this OEP, to make it clear that it is an important topic. Your other arguments are also pretty reasonable: maybe also add them to the OEP? It would also help to clarify what is the intended audience for Helm charts: is it anyone running Open edX on k8s or just multi-tenant operators? Will it be possible for existing tutor k8s
users to migrate to tutor helm
?
Lastly: do you really need an OEP for that? I'm not opposed, but creating an OEP seems overkill to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@regisb I think these Helm charts will primarily focus on multi-tenant installations. From what I understand, Tutor is fundamentally designed to manage single instances at a time and regardless of what plugins you write, it's hard to do things like deploy a cluster with a shared database, shared load balancer, and other shared resources using Tutor.
That's why we're exploring using Tutor+plugins for everything up to the point of deploying onto a cluster (i.e. customizing, building images, doing upgrades, other management), but controlling the deployment onto the cluster using Helm.
Or alternately, maybe it would make sense to deploy the cluster (DBs, LB, monitoring, autoscaling) using Helm and let Tutor deploy each individual instance onto that cluster. In the latter case, I don't think a Tutor plugin to deploy the cluster (without instances yet) makes any sense? But certainly it would make sense to ensure that Tutor and its plugins are compatible with the Helm-provisioned multi-tenant clusters. From looking at an example of a Tutor plugin's k8s deployment spec, I don't see anything that would be incompatible.
Perhaps the way that load balancers work would need to be changed. There cannot be a central Caddyfile
to configure a shared load balancer. Rather, each IDA needs to have a service annotation or Custom Resource which the shared, stateless load balancer detects and uses to automatically route traffic and provision HTTPS certs.
I'll need to investigate a bit more, it definitely seems that there's more to consider here.
Lastly: do you really need an OEP for that? I'm not opposed, but creating an OEP seems overkill to me.
I'm not sure either. Maybe it is overkill. It seems like if it's what most providers in the community are doing, we should document it somewhere and encourage consistency and collaboration. And currently other ops-related decisions are documented using OEPs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just chiming in:
[...] what cannot be achieved using Tutor plugins [...]
This reminds me one thing, existing K8s resource definitions cannot be removed by using the strategic merge patching mechanism.
@bradenmacdonald @felipemontoya Thank you again for putting this OEP together! What's needed to get it out of the draft state, and move forward with this? |
@antoviaque @felipemontoya I think we need to do some more technical discovery to better understand how this proposal could be made compatible with Tutor's existing k8s support, so that people can take advantage of Tutor plugins in a multi-tenant k8s installation that's managed by Helm, without needing to modify or rewrite the k8s part of each plugin. |
@bradenmacdonald OK - do we have some room to do that now? I wouldn't want to lose the momentum we were building around collaborating on this, so let me know if we need to prioritize to find the time for it. |
@antoviaque Yeah I think I can start poking away at it next week, but I don't really have a ton of time to spend on it. |
@bradenmacdonald I also have found difficulties to spend a lot of time in this. However the reasoning you wrote in a previous comment
is similar to what we have been doing so far. Tutor + plugins to get the cluster first installed and then maintaining is a different beast.
I think this is an approach that we could explore simultaneously to the compatibility layer between tutor and helm. @MoisesGSalas fyi. |
Update: I got some time this week so have started going some testing and prototyping. Will report back in a week or so. |
8283284
to
8353fc1
Compare
As per our conversation in the last sync meeting, I altered this PR to turn the draft OEP into an ADR belonging to OEP-45. I also included the requirement that the Helm charts must be compatible with single openedx instances managed by tutor to address what was the most pressing concern of this proposal. |
8353fc1
to
9aa3119
Compare
Thanks @felipemontoya, looks good to me! |
Hey @felipemontoya @bradenmacdonald, just checking in to see if this PR should still be in |
@itsjeyd Sure, it's ready for review now. Though it's much more important that people test out and contribute to https://github.com/open-craft/tutor-contrib-multi (see the issues listed on that repo) to understand this project rather than just reading this small ADR. |
@sarina @e0d @itsjeyd As we get into starting the formal review phase of this OEP, are there any steps you would like us to take to follow proper procedure? If you haven't already seen it, besides the contents of this OEP, you'll find the full context on this initiative on the forum threads:
During the meeting from https://discuss.openedx.org/t/deploying-open-edx-on-kubernetes-using-helm/8771 yesterday we discussed opening a longer review period than usual, until the next meeting on January 10th -- due to the holidays, and the need to use the review period to complete concrete technical validation steps. @felipemontoya will be announcing this review period on the forum shortly. To also help making it clear that we intend to do this project as a shared collaborative effort within the framework of the Open edX project, it would also be useful to move the repository containing the code & tasks list to the github openedx organization - would it be possible to do this now? |
Hi @antoviaque - I haven't been involved in this at all so I can't speak to it except for your specific question about moving the repo to the openedx org. For that please file a request ticket of type Systems Request and we will review it/discuss it there. |
@sarina Thanks for the quick reply! Ticket created: openedx/axim-engineering#577 |
@antoviaque From the perspective of OSPR management the main thing to do will be to update the status of the PR to In Review on the contributions board. @mphilbrick211 Could you please take care of that? That should be it, procedure-wise. |
Co-authored-by: Adolfo R. Brandes <adolfo@opencraft.com>
c66e61c
to
f8256b3
Compare
I changed the status to 'provisional'. Now I need an approval to be able to continue the merge process. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm leaving my approval here already.
However since I'm author of this and arbiter of the original of the OEP I think we should have two approvals on top of mine.
@arbrandes I forgot to tag you as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@felipemontoya @bradenmacdonald Sorry I'm late to the game and thank you for this very helpful ADR. I have no blocking concerns, just a couple clarifying questions.
|
||
For Open edX operators who wish to deploy Open edX on Kubernetes, community-maintained `Helm Charts`_ will be used to simplify the process of deploying a kubernetes cluster capable of loading one or more Open edX instances. | ||
|
||
Building on top of the `decision 0001`_ those single instances will in turn be managed by Tutor and the Tutor plugin ecosystem. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to check my understanding, the implication here is that we will not use to Tutor render Helm charts from Jinja2 templates, right? If so, I am glad to hear that. Templates-rendering-templates sounds a bit too much like the edx/configuration for my comfort.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the idea is to use Helm charts (with no involvement of Tutor) to deploy shared resources onto the cluster (load balancing with HTTPS, logging, monitoring, autoscaling, ES, Redis, maybe MySQL etc. too). And then once that cluster is provisioned, another workflow (which may vary per provider and is out of scope of this project for now) will use Tutor to build Open edX images and deploy instances onto the cluster using those images.
Wherever possible we want to use k8s primitives so that instances are "auto-detected" as they get deployed - e.g. each instance deploys K8S Ingress
objects so that the central load balancer detects the deployment and provisions HTTPS certs and starts routing traffic automatically. Where this is not possible, a tutor plugin or init script will be used to do things like provision database users in the shared database as each instance is deployed.
|
||
For most of the history of the Open edX project, production deployments were done primarily using Virtual Machines that had been provisioned using the ansible scripts in the ``edx/configuration`` repository. With the adoption of the `decision 0001`_ from OEP-0045 that selects Tutor as the replacement for ``edx/configuration`` some large instances and operators are encountering use cases that are not supported. Multiple instances of Open edX sharing some infrastructure is the first of such cases. | ||
|
||
A popular way to deploy containerized applications is to use `Kubernetes`_. However, Kubernetes has a lot of complexity so the overhead it brings is usually not worthwhile for small, singe-instance deployments. For that reason (as explained in OEP-45) as well as a general lack of experience with Kubernetes, to date there has been no officially recommended way to deploy Open edX on Kubernetes. Although Tutor does include basic support for Kubernetes, it is focused only on a basic working deployment, and lacks the flexibility and capability that operators require for large, multi-instance deployments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding:
large, multi-instance
Do you think this initiative would also be useful to an organization with large, single-instance deployments? In particular, I am thinking of 2U and edx.org. (@jmbowman)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We hope that there is significant overlap and that we can align on best practices, but there will likely be some differences in the approach due to the unique requirements that edx.org has and the single-instance nature.
But yes, although it's not an explicit design goal, I do hope that for anyone who wants to deploy a single highly-scalable instance onto their k8s cluster, that this Helm chart will still be the best approach for that in general too.
👍🏻 I also approve but since GitHub considers me the opener of this PR I can't do it formally. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm no expert in any of this, but the ADR is easy to follow and sounds very well-reasoned, so on that basis I give it a 👍🏻
Thanks to everyone that commented, requested changes and participated in the conversacion.With the current approvals and the 'provisional' status we are ready to merge this. |
@bradenmacdonald 🎉 Your pull request was merged! Please take a moment to answer a two question survey so we can improve your experience in the future. |
As discussed in our meeting last tuesday, Feb 7th, I suggest changing the title of the ADR. The current title suggests that we are trying to install Open edX using helm charts, and it's not the idea. |
This amends OEP-45 to say that we're planning to use Helm to create a standardized way to provision a k8s cluster so that it can host multiple Open edX instances.