OEP-45 Operating and Configuring Open edX #143

bderusha · 2020-04-08T21:12:28Z

This OEP is under review from 2020-04-17 though 2020-05-04.

The goal is to vet the proposed end state before we start talking about how to get there to make sure that the proposal properly outlines and answers major questions about how to operate and configure Open edX in this future state.

We are keeping this review period relatively short because, based on early feedback, we don't expect it to be a particularly controversial goal. Agreeing on this OEP quickly will allow us to move the conversation over to implementation strategies where we expect there to be more diverse needs to take into consideration. If it turns out to be more controversial than anticipated we will extend the review period to accommodate more feedback.

The implementation discussion will happen in this follow-on PR. While comments can be made at any time, I would ask that people refrain from any deep discussions there about how to achieve OEP-45 until we agree on what OEP-45 should be.

oeps/oep-0045-arch-ops-and-config.rst

bradenmacdonald · 2020-04-09T17:43:17Z

@bderusha This sounds great, but will be a big change! Could you perhaps talk more about the rollout strategy for this?

As we've mentioned before, a partial change to this approach such as removing the config files from configuration but without the rest of these pieces (specifically strict required vs. default settings files, but also proper containerization and provisioning stories, and so on) could be very difficult for those of us who run hundreds of Open edX instances, so my hope would be to see either (1) as much of this changed as possible in a single Open edX release cycle (Koa?), or (2) to have a transition phase wherein the configuration repo has ansible scripts that accept all the old ansible variables and spit out the new-style configuration scripts and then deploy the apps using containers.

oeps/oep-0045-arch-ops-and-config.rst

bderusha · 2020-04-16T15:10:53Z

@bradenmacdonald The implementation strategy for this will be extremely important. I would like to have separate conversations about whether we think this is a good direction for the project and how we go about getting there. As such, I have created a follow-up PR where we can have that tactical discussion and eventually merge the results back into this OEP.

#144

coryleeio · 2020-04-17T20:16:38Z

oeps/oep-0045-arch-ops-and-config.rst

+* A Dockerfile from which the container image can be built.
+* A documented settings file with production-ready defaults.
+* A documented settings file with all of the configuration values that must be defined by the operator.
+* An operations manual documenting standard operations required to run and maintain the application.


Would this include / should this include a rolling changelog for changes that will affect operators?

I think a changelog is a good idea. I don't know the value of an operations specific log VS a single log to capture all relevant changes to the codebase. Interested to hear other's thoughts here.

Changelogs are often categorized. Perhaps "Operations" could be a category along with "Breaking changes", "Bug fixes", etc.

The Stanford team found the changelog in the configuration repo to be helpful during upgrades, so +1

The "django way" is to have both a changelog (release notes) and a manual that indicates differences with the previous version.

For instance, in the manual:

Run foobarcocktail --no-dry-run to update the foogarizer. This feature was introduced in release Zebulon

And then in the changelog:

Zebulon release notes:

[Operations][Breaking change] In order to update the foogarizer the --no-dry-run option must be added to the foobarcocktail command.

As with my previous comments, the most important type of changelog I can imagine is the one that deals with required configuration changes. This affects operators directly. If no other changelogs are explicitly named, I'd vote for this one.

oeps/oep-0045-arch-ops-and-config.rst

regisb · 2020-04-20T08:12:20Z

oeps/oep-0045-arch-ops-and-config.rst

+
+``required.py`` variables will all be initialized to ``None`` and the application will not start unless they are set.  This allows operators to fail fast rather than finding out about an unset value when users exercise those breaking codepaths. Application developers are encouraged to keep the list of required settings to a minimum.
+
+This new settings structure obviates the need for any other python files in the settings directory (such as ``devstack.py``, ``test.py``, etc).  The values currently set in those files should be moved to a corresponding ``devstack.yml``, ``test.yml``, etc in the same settings directory.  This gives developers and operators more consistency across environments since the same code paths are being executed with different values.


Are we talking about using the Docker images for local development? I think we should: or at least, we should be able to test the Docker images locally by pointing to localhost (and not S3, for instance).

If we are to use these Docker images for local development, I'm afraid this is an overly optimistic approach for organising the settings. I don't think we will be able to reach this state where we only have defaults.py/required.py without having to resort to huge prod.yml/dev.yml files.

My point is that required.py and defaults.py, by themselves, will not be usable in either a development or production environment without very large prod.yml/dev.yml files. The reason for that is that prod.py/dev.py contain not only configuration values, but also configuration logic.

Let's take a very simple example: in production we need to set DEBUG = False. This is just a fact that is true for all users. Does that mean that all users should now add DEBUG: false to their yaml files? I don't think it should: there are quite literally hundreds of settings that need to be set separately for production and development, but for which we can easily predict sane defaults.

The alternatives are:

Use a single settings file that differentiates between dev and prod environments. This is the approach used by Sentry. See: https://github.com/getsentry/sentry/blob/685427a5dea1e1ad0adb1816256e28f4958f570f/src/sentry/conf/server.py#L68

Use separate settings files: base.py, production.py, development.py. This is advocated by Two Scoops of Django and it's Jacob Kaplan-Moss' "The One True Way".

In the case of edx-platform, I would argue for approach 2.

We would have, similar to what we currently do:

base.py:

# Reasonable settings additional_settings = yaml.load(os.environ["CONFIG_PATH"]) if "CONFIG_PATH" in os.environ else {} DEBUG = additional_settings.get("DEBUG", False) ALLOWED_HOSTS = additional_settings.get("ALLOWED_HOSTS", ["localhost", "localhost:8000"]) ...

development.py:

from .base import * # Development logic DEBUG = True ALLOWED_HOSTS = ["*"] SECRET_KEY = "v3rys3cr3t"

production.py:

from .base import * # Production logic DEBUG = False ALLOWED_HOSTS = [additional_settings["HOST"]] SECRET_KEY = additional_settings["SECRET_KEY"] # fail fast in case this is not defined ...

I understand that, by doing so, we lose the benefit of automatically loading the settings from init.py, but in my experience this is not a problem: settings will be automatically loaded as soon as we set the DJANGO_SETTINGS_MODULE environment variable (which should be set to "prod" by default in the Docker image)

Furthermore, users will still be able to customize production/development settings by creating their own settings files which will import from production.py/development.py (as we do today). This is pretty much the standard way to proceed in the django world.

One of the goals of this OEP is to have an application that executes the same codepaths in development as in production wherever reasonable. I believe that this is a place that could reasonably be one code path. Let's take the dev defaults case, because it's a very valid point. Here is how I would see that working.

required.py

# Variables required to run, but without reasonable production defaults LMS_BASE_URL = None JWT_AUTH = None ...

defaults.py

# Variable with reasonable defaults DEBUG = false DEFAULT_PAGE_SIZE = 25 STATIC_URL = '/static/' ...

init.py

import os from .required import * from .defaults import * def get_env_setting(setting): """ Get the environment setting or return exception """ try: return os.environ[setting] except KeyError: error_msg = u"Set the %s env variable" % setting raise ImproperlyConfigured(error_msg) CONFIG_FILE = get_env_setting('APP_CFG') with open(CONFIG_FILE, encoding='utf-8') as f: config_from_yaml = yaml.safe_load(f) vars().update(config_from_yaml)

development.yml (lives in the repo either like this or as starting yaml to copy development.yml.example)

# required LMS_BASE_URL: localhost:18000 JWT_AUTH: { ... auth ... } # default overrides DEBUG: true DEFAULT_PAGE_SIZE: 1000

production.yml (managed securely elsewhere)

# required LMS_BASE_URL: www.my-prod-lms.com JWT_AUTH: { ... prod auth ... }

I am all for giving developers and operators as many of the values defaulted to suit their environments as possible. It is also likely that for testing we will need/want a test.yml to override certain settings. That should also live in the repo.

IMO this method increases the likelihood that new settings end up in the right place, documented, with proper defaults. And that when people are writing and testing their apps they aren't surprised by how it works in production.

iloveagent57

I like it.

oeps/oep-0045-arch-ops-and-config.rst

nedbat · 2020-04-28T18:49:16Z

oeps/oep-0045-arch-ops-and-config.rst

+
+``required.py`` variables must be overridden by operators.  The application will check that operators provided these values, and will not start unless they are set. This allows operators to fail fast rather than finding out about an unset value when users exercise those breaking codepaths. Application developers are encouraged to keep the list of required settings to a minimum.
+
+This new settings structure obviates the need for any other python files in the settings directory (such as ``devstack.py``, ``test.py``, etc). The values currently set in those files should be moved to a corresponding ``devstack.yml``, ``test.yml``, etc in the same settings directory. This gives developers and operators more consistency across environments since the same code paths are being executed with different values.


Developers have found "private.py" helpful. That will also go away?

Looking at edx-platform/lms/envs/test.py, there is a little bit of executable code in it, for example to randomize keys. How will that be handled in a test.yml world?

Yes. However we should talk about how to replicate it's usefulness in a yml config world.

FWIW, I took a look back at how we handled this generally (custom logic in settings .py files) on the Stanford fork...

We used to have some crazier things (heck, we shadowed the entire {lms/cms}/envs/ directory structure), but here's what we were actually using in practice, at the end:

Contextually removing some settings (INSTALLED_APPS, MIDDLEWARE_CLASSES) [1]

expanding wildcard paths [2]

overriding logging config [3]

Re: private.py specifically:
We definitely used this and I'd worry that folks won't be able to replicate all of the capabilities of custom Python logic w/in a .yml file. Is there a specific reason we're looking to remove this feature, eg, "private.py considered harmful"?
Or is this more "getting caught up in the storm" of cleaning out the settings directory?
Since private.py is explicitly git-ignored, it shouldn't increase the number of files checked in to that directory, with minimal maintenance overhead on the implementation [4].

I'm open to alternatives, as long as we don't have to reinvent the wheel.

[1] https://github.com/Stanford-Online/edx-platform/blob/master/openedx/stanford/cms/envs/test.py#L8-L11

[2] https://github.com/Stanford-Online/edx-platform/blob/master/openedx/stanford/cms/envs/common.py#L67-L69

[3] https://github.com/Stanford-Online/edx-platform/blob/master/openedx/stanford/lms/envs/devstack.py#L9-L20

[4] https://github.com/Stanford-Online/edx-platform/blob/master/lms/envs/devstack.py#L273-L275

@nedbat I just looked through that file and I didn't see any executable code that couldn't just be a preset value. I don't think we will be able to get away with just a straight import and override from the yaml, but I do think that we can keep any executable code required to set up the app in one code path.

If people have valid cases for executable code in the settings files that only apply to a particular environment I would love to see those cases. I wasn't able to find any myself.

@stvstnfrd thanks for all those great references! I will copy a line written in previous comment

One of the goals of this OEP is to have an application that executes the same code paths in development as in production wherever reasonable. I believe that this is a place that could reasonably be one code path.

I would love for that to be our starting assumption and see if there is a way to achieve the needs of operators and developers without compromising current functionality (though we may need to change current processes/mindsets). Your examples bring up a great and very common configuration challenge. INSTALLED_APPS and MIDDLEWARE_CLASSES

Would it be reasonable for all configuration files to be able to define:

INSTALLED_APPS: - 'app1' - 'app2' APPEND_INSTALLED_APPS: - 'app3' REMOVE_INSTALLED_APPS: - 'app2'

INSTALLED_APPS would completely overwrite the value. APPEND and REMOVE would take the corresponding list of value and perform that action. Allowing the code path to resolve these common cases the same way across the board.

@bderusha Cool, yeah, I probably should have split apart the two lines of thought in that comment (private.py vs general config), as I am implying the former could be be kept around w/ negligible overhead/maintenance, but not implying as much with the latter.

For the general config stuff, I didn't mean to imply "this can't be done!"; just wanted to flag some examples that may not yet be accounted for :)

@bderusha Your INSTALLED_APPS example does a good job of highlighting how handling some Django settings will be more complex than just "fully override it with the value in the YAML". I think edx-platform OPTIONAL_APPS also illustrates this.

I think it would be worth eventually specifying our "special" Django settings (APPEND_INSTALLED_APPS, REMOVE_MIDDLEWARE, etc.) somewhere visible, with a reference implementation of how to handle them in __init__.py. That being said, I think it's out of the scope of this PR, as it would require some digging and discovery to find out what those special settings should be and how we would handle them. Maybe that's something that would end up in your follow-up PR.

nedbat · 2020-04-28T18:54:48Z

oeps/oep-0045-arch-ops-and-config.rst

+
+**Config file generation & management**
+
+Due to the varied needs and processes of different operators, how the config files are created, managed, or otherwise end up on the server is up to the operator and will depend greatly on their deployment strategy.


Will we be generating test.yml?

While the deployment processes will often vary, does (or should) the creation vary?

If we're forcing certain values to be set by operators (and enforcing via early failure), it seems like we ought to provide a canonical way to create the files, or at least provide a blessed reference implementation.

The actual implementation would vary, but I'm imaging at least a helper of sorts, something in the spirit of:

https://github.com/edx/configuration/blob/eec2b4ae2fdf88fd17aabfc11a308951b243f9e4/util/install/generate-passwords.sh

@nedbat I think that applications should define a test.yml that can be used, I would not expect to need to generate them.

@stvstnfrd I don't have a strong opinion here. My thought was that it is so simple that no reference or helper would be required. Value defined in required.py or default.py maps 1-to-1 to the value you stick in your yml file to override it. Interested to hear more about what the helper would actually do

@bderusha I guess I don't know either :) I'll probably have a better idea upon seeing potential implementations.
And no, I'm not expecting that at this point, just slightly pushing back on the idea of deferring the responsibility permanently :)

My ears perk up whenever I hear, "operators can figure this out themselves", as those were often the areas that bit us most frequently :P

It's probably less of an issue for us than it will be for the community.

I assume we'll generally make the change to {required,default}.py immediately in tandem with the change to the ${env}.yml (whether that's in a private repo or what not).

On the other hand, community operators are going to "inherit" the changes to {required,default}.py weeks/months/years later, and then will need to reconcile the two files, line by line (?? since it won't be a simple diff, since we're comparing py and yml files) or while auditing the CHANGELOG.

Of course, it won't be the "worst thing ever" ™️ , but it seems like an extra tedium to incur upon operators, that we won't feel ourselves.

Off the top of my head (again, I'm sure seeing sample implementations will help my mental model), I could imagine a script that auto-merges/reconciles the py files with the yml. Something like:

./upgrade-config --input ironwood.yml --output juniper.yml # defaults.py and required.py are implied default inputs

So assuming input files like:

## required.py ADMIN_EMAIL = YAML_CONFIG['ADMIN_EMAIL'] NEW_TOGGLE_ADDED_IN_JUNIPER = YAML_CONFIG['NEW_TOGGLE_ADDED_IN_JUNIPER']

## defaults.py LMS_BASE = YAML_CONFIG['LMS_BASE'] ANOTHER_TOGGLE_ADDED_IN_JUNIPER = YAML_CONFIG['ANOTHER_TOGGLE_ADDED_IN_JUNIPER']

## ironwood.yml ADMIN_EMAIL: 'admin@example.com' LMS_BASE: 'localhost:18000' OLD_TOGGLE_REMOVED_IN_JUNIPER: true

Might yield an output file like:

## juniper.yml ## This file was automatically generated by: `upgrade-config` ## Deprecated settings ## The following are now unsupported in this release: # OLD_TOGGLE_REMOVED_IN_JUNIPER: true ## Required settings ## These must all be set: ADMIN_EMAIL: 'admin@example.com' # NEW_TOGGLE_ADDED_IN_JUNIPER: 'recommended default value' ## Default overrides ## These are optional overrides: LMS_BASE: 'localhost:18000' # ANOTHER_TOGGLE_ADDED_IN_JUNIPER: 'default to the value in `defaults.py`'

An operator would still need to tweak the output file (to update new required values), but I can imagine some process like this making it easier and more obvious to automatically see:

which new required settings have been added

which new optional overrides are available

which settings have been deprecated and are no longer supported

I'm sure there would be deployments where this isn't as useful, but I'd expect it to remove busy work and mistakes for many operators.

Agree with @stvstnfrd last point sounds very reasonable.

+1 for the automated way to see which settings have been added, removed, and/or deprecated.

As for "letting the operator decide", my ears also perk up. Further above, I suggest the possibility of implementing exactly such a "blessed" deployment strategy, in the spirit of OpenStack Kolla.

oeps/oep-0045-arch-ops-and-config.rst

stvstnfrd

Here are some thoughts from the ol' Stanford perspective :)

stvstnfrd · 2020-04-28T20:18:18Z

oeps/oep-0045-arch-ops-and-config.rst

+* A Dockerfile from which the container image can be built.
+* A documented settings file with production-ready defaults.
+* A documented settings file with all of the configuration values that must be defined by the operator.
+* An operations manual documenting standard operations required to run and maintain the application.


The Stanford team found the changelog in the configuration repo to be helpful during upgrades, so +1

stvstnfrd · 2020-04-28T20:24:52Z

oeps/oep-0045-arch-ops-and-config.rst

+
+**Docker Images**
+
+edX will provide Docker images for applications that captures the latest code on the master branch as well as images representing named releases. edX will not provide these images for named releases prior to the acceptance and implementation of this OEP which is Juniper at time of writing.


Not sure if it's covered elsewhere, but just to call out:
While default docker images will cover the majority of cases, we should be sure it's easy (enough) to both build custom images and then implement them. The Stanford team was slowed down getting our custom containers up and running in the docker-based devstack.

stvstnfrd · 2020-04-28T21:07:26Z

oeps/oep-0045-arch-ops-and-config.rst

+
+``required.py`` variables must be overridden by operators.  The application will check that operators provided these values, and will not start unless they are set. This allows operators to fail fast rather than finding out about an unset value when users exercise those breaking codepaths. Application developers are encouraged to keep the list of required settings to a minimum.
+
+This new settings structure obviates the need for any other python files in the settings directory (such as ``devstack.py``, ``test.py``, etc). The values currently set in those files should be moved to a corresponding ``devstack.yml``, ``test.yml``, etc in the same settings directory. This gives developers and operators more consistency across environments since the same code paths are being executed with different values.


FWIW, I took a look back at how we handled this generally (custom logic in settings .py files) on the Stanford fork...

We used to have some crazier things (heck, we shadowed the entire {lms/cms}/envs/ directory structure), but here's what we were actually using in practice, at the end:

Contextually removing some settings (INSTALLED_APPS, MIDDLEWARE_CLASSES) [1]

expanding wildcard paths [2]

overriding logging config [3]

Re: private.py specifically:
We definitely used this and I'd worry that folks won't be able to replicate all of the capabilities of custom Python logic w/in a .yml file. Is there a specific reason we're looking to remove this feature, eg, "private.py considered harmful"?
Or is this more "getting caught up in the storm" of cleaning out the settings directory?
Since private.py is explicitly git-ignored, it shouldn't increase the number of files checked in to that directory, with minimal maintenance overhead on the implementation [4].

I'm open to alternatives, as long as we don't have to reinvent the wheel.

[1] https://github.com/Stanford-Online/edx-platform/blob/master/openedx/stanford/cms/envs/test.py#L8-L11

[2] https://github.com/Stanford-Online/edx-platform/blob/master/openedx/stanford/cms/envs/common.py#L67-L69

[3] https://github.com/Stanford-Online/edx-platform/blob/master/openedx/stanford/lms/envs/devstack.py#L9-L20

[4] https://github.com/Stanford-Online/edx-platform/blob/master/lms/envs/devstack.py#L273-L275

stvstnfrd · 2020-04-28T21:14:36Z

oeps/oep-0045-arch-ops-and-config.rst

+
+**Config file generation & management**
+
+Due to the varied needs and processes of different operators, how the config files are created, managed, or otherwise end up on the server is up to the operator and will depend greatly on their deployment strategy.


While the deployment processes will often vary, does (or should) the creation vary?

If we're forcing certain values to be set by operators (and enforcing via early failure), it seems like we ought to provide a canonical way to create the files, or at least provide a blessed reference implementation.

The actual implementation would vary, but I'm imaging at least a helper of sorts, something in the spirit of:

https://github.com/edx/configuration/blob/eec2b4ae2fdf88fd17aabfc11a308951b243f9e4/util/install/generate-passwords.sh

nasthagiri

Excellent and thorough design proposal! This will do wonders for reducing our complexity. Thank you!

oeps/oep-0045-arch-ops-and-config.rst

e0d · 2020-04-29T10:07:14Z

oeps/oep-0045-arch-ops-and-config.rst

+Containers
+**********
+
+**Why Containers**


Some of this content isn't universally accepted to be true, for example that containers are more secure. I'm not sure it's required to make this case here. Containerization feels like a different OEP to me.

It's true that containers might not improve security across the board, but they certainly help to address some security problems. For instance: it's easier to limit the exposition of service ports to the outside world. Also, applications are better isolated from one another.

I feel like it's important to advocate the use of containers in this OEP, as its the primary alternative for non-edX users.

oeps/oep-0045-arch-ops-and-config.rst

staubina

I am excited about this approach. I look forward to discussing how we make this happen and the plan forward.

feanil · 2020-04-30T15:14:12Z

oeps/oep-0045-arch-ops-and-config.rst

+
+The settings defined in ``required.py`` and ``defaults.py`` files are mutually exclusive, representing all application specific settings as well as installed library settings whose values either must be provided or whose defaults are not considered production-ready.
+
+``required.py`` variables must be overridden by operators.  The application will check that operators provided these values, and will not start unless they are set. This allows operators to fail fast rather than finding out about an unset value when users exercise those breaking codepaths. Application developers are encouraged to keep the list of required settings to a minimum.


Is it too specific to say we'll use the ImproperlyConfigured exception?

Is it worth specifying how how the service will check if all of its required settings have been overriden? Or is that up to the service?

I think the how is up for debate and can be moved to the implementation section of this OEP

feanil · 2020-04-30T17:34:21Z

oeps/oep-0045-arch-ops-and-config.rst

+
+The settings defined in ``required.py`` and ``defaults.py`` files are mutually exclusive, representing all application specific settings as well as installed library settings whose values either must be provided or whose defaults are not considered production-ready.
+
+``required.py`` variables must be overridden by operators.  The application will check that operators provided these values, and will not start unless they are set. This allows operators to fail fast rather than finding out about an unset value when users exercise those breaking codepaths. Application developers are encouraged to keep the list of required settings to a minimum.


If we're we're overriding the defaults from the config file, then I think you won't actually need to do any assignment in the required.py it will only have the validation checks. From what I can tell about sphinx autodoc, it looks like it's setup to pickup assignments, so you may end up in a situation where you have to do an assignment for the docs to pick it up even though it won't be necessary for the code to work correctly.

It's not clear to me that required.py will do any validation, which would mean that it exists entirely for documentation. I'm not sure how I feel about that. Could we instead just put required settings in defaults.py, and used Sphinx annotations to indicate that they're required?

If I'm wrong, and required.py will in fact do validation, then I recommend renaming required.py to validate_required.py, and clarifying that it will be imported after the YAML is loaded.

Because the implementation details of this part matter quite a bit to the final design, I think I would like to move this conversation over to the "how we get there" part. We should be open to amending this section as we coalesce around exactly what operating pattern needs to exist to realistically land somewhere where we have 1 codepath for configuration and clear delineations for operators around what they must provide to run the app.

giovannicimolin

This proposal, looks great! And I'm looking forward to see these ideas implemented.

I've left a few comments in the document.

giovannicimolin · 2020-04-30T20:30:26Z

oeps/oep-0045-arch-ops-and-config.rst

+
+.. _Sphinx autodoc: https://www.sphinx-doc.org/en/master/usage/extensions/autodoc.html#directive-autoattribute
+
+Operations Manuals


The edx/configuration repo also handles setting up requirements between LMS/Studio and other IDAs, such as creating OAuth2 credentials/accouts (oauth-client-setup).

Since this OEP's goal is to replace configuration, should we add instructions in the operations manuals about what needs to be done in order to configure each IDA.

To continue with the "blessed reference deployment project" idea, this would be exactly one of its mandates: configuration of the interaction between the different services.

giovannicimolin · 2020-04-30T20:44:24Z

oeps/oep-0045-arch-ops-and-config.rst

+
+    Installing and running an Open edX instance is not simple. We strongly recommend that you use a service provider to run the software for you. 
+
+The proposal below outlines how we can create a cleaner, more intuitive interface for operating Open edX and in doing so help both edX.org and the Open edX community at large achieve better outcome at a faster pace. In brief it is:


This OEP does a really good job describing how configuration should be done on each app and sets great standards for building images, but it doesn't cover the "how to put things together" like the current configuration repo implicitly does.

For example, when running openedx_native.yml, the playbooks set up, on a single instance, all the services needed to run a full Open edX deployment:

LMS

Studio

Workers

IDAs

Other support services

Handles setting up LMS/Studio and wiring between them and the IDAs

I know that this OEP doesn't cover deployment because there are a lot of different ways to do that, but the requirements for a Open edX deployment are not described anywhere (except implicitly in the configuration repo).
So the configuration repo isn't the best reference, but it's currently all we have for deploying from scratch.

Can we add a section on this OEP (or create a follow up OEP), to better document and map how the platform is supposed to work? What is each piece, and what's required for a "basic" installation if starting from scratch using the docker images?

This is a very good point. Perhaps Open edX can take inspiration from OpenStack Kolla. In spirit, it is akin to what are referred to as applications in this OEP (a "project", in OpenStack parlance), is officially supported by the parent organization, and deals exactly with the question outlined by @giovannicimolin: a reference production deployment. If people end up using it or not is less relevant than the fact it is an officially supported way to get a working production deployment, even if basic at first.

As far as this OEP is concerned, I believe it would be a good idea to refer to such a project as an official goal.

giovannicimolin · 2020-04-30T20:51:38Z

oeps/oep-0045-arch-ops-and-config.rst

+edX will provide Docker images for applications that captures the latest code on the master branch as well as images representing named releases. edX will not provide these images for named releases prior to the acceptance and implementation of this OEP (Aspen through and including Juniper at time of writing).
+
+
+Configuration


How does this new configuration strategy handles differentiating between a container running only workers to a container actually serving the LMS/Studio pages (gunicorn).

Will each repo have a separate dockerfile for each container "type" (celery worker, gunicorn, etc) or this is going to be managed through configuration?

I'm asking this because the workers and the LMS run off the same codebase with the same requirements, with just different entrypoints.

Is it worth adding a section here about this specific point?

arbrandes

Overall, I think this is an excellent direction for Open edX to take. To distill my inline comments, I only feel like two things are missing:

An explicit intention to document, either programatically or in text, how configuration requirements change from one release to the next, or even from one commit to the next.
An explicit intention to build a reference deployment application in its own repository, maintained in step with all the other Open edX applications.

arbrandes · 2020-05-01T23:20:01Z

oeps/oep-0045-arch-ops-and-config.rst

+Wherever possible, operating concerns will live within the codebase they are concerned with. Thus it should be expected that all applications will provide the following:
+
+* A Dockerfile from which the container image can be built.
+* A documented settings file with production-ready defaults.


I would like to see an explicit commitment that these defaults will be maintained in step with the rest of the application, preferably with an associated Changelog. Would that be possible?

arbrandes · 2020-05-01T23:22:15Z

oeps/oep-0045-arch-ops-and-config.rst

+
+* A Dockerfile from which the container image can be built.
+* A documented settings file with production-ready defaults.
+* A documented settings file with all of the configuration values that must be defined by the operator.


As with the defaults, I would expect this file to be maintained in step with the application, with changes being documented in a Changelog. One of the pains with current edx-configuration is the difficulty in figuring out what variables need to be reworded from one release to the next.

arbrandes · 2020-05-01T23:25:05Z

oeps/oep-0045-arch-ops-and-config.rst

+* A Dockerfile from which the container image can be built.
+* A documented settings file with production-ready defaults.
+* A documented settings file with all of the configuration values that must be defined by the operator.
+* An operations manual documenting standard operations required to run and maintain the application.


As with my previous comments, the most important type of changelog I can imagine is the one that deals with required configuration changes. This affects operators directly. If no other changelogs are explicitly named, I'd vote for this one.

arbrandes · 2020-05-01T23:32:45Z

oeps/oep-0045-arch-ops-and-config.rst

+
+    Installing and running an Open edX instance is not simple. We strongly recommend that you use a service provider to run the software for you. 
+
+The proposal below outlines how we can create a cleaner, more intuitive interface for operating Open edX and in doing so help both edX.org and the Open edX community at large achieve better outcome at a faster pace. In brief it is:


This is a very good point. Perhaps Open edX can take inspiration from OpenStack Kolla. In spirit, it is akin to what are referred to as applications in this OEP (a "project", in OpenStack parlance), is officially supported by the parent organization, and deals exactly with the question outlined by @giovannicimolin: a reference production deployment. If people end up using it or not is less relevant than the fact it is an officially supported way to get a working production deployment, even if basic at first.

As far as this OEP is concerned, I believe it would be a good idea to refer to such a project as an official goal.

arbrandes · 2020-05-01T23:49:31Z

oeps/oep-0045-arch-ops-and-config.rst

+
+**Config file generation & management**
+
+Due to the varied needs and processes of different operators, how the config files are created, managed, or otherwise end up on the server is up to the operator and will depend greatly on their deployment strategy.


+1 for the automated way to see which settings have been added, removed, and/or deprecated.

As for "letting the operator decide", my ears also perk up. Further above, I suggest the possibility of implementing exactly such a "blessed" deployment strategy, in the spirit of OpenStack Kolla.

arbrandes · 2020-05-01T23:51:41Z

oeps/oep-0045-arch-ops-and-config.rst

+
+.. _Sphinx autodoc: https://www.sphinx-doc.org/en/master/usage/extensions/autodoc.html#directive-autoattribute
+
+Operations Manuals


To continue with the "blessed reference deployment project" idea, this would be exactly one of its mandates: configuration of the interaction between the different services.

arbrandes · 2020-05-01T23:58:24Z

oeps/oep-0045-arch-ops-and-config.rst

+
+A clear manual of operations will exist in the form of RST files in an ``operations`` directory within the ``documentation`` directory for that application. See `this commit`_ for an example provided by the Open edX Build-Test-Release working group. The operations docs will cover common operations such as how to run the application for web traffic or as an async worker and how to manage the application's underlying database schema. It will also include a list of potential maintenance tasks operators may want to leverage such as clearing sessions or applying security patches. Finally it will include the list of ad-hoc management commands operators can use to help handle edge case or one-time operations.
+
+In the same vein as not dictating how operators create and manage their application config files, operators will also be expected to manage how they execute the operations documented in the manual.


I agree completely with avoiding hard-coded prescriptions. And documenting suggestions for operation is fabulous. But to get buy-in from the community, I once again suggest coming up with a section in this OEP devoted to the creation of a "blessed" reference deployment project with its own repository. It doesn't have to be turn-key, necessarily, but it can at least afford to be more opinionated than the Dockerfiles and settings files that live in the corresponding applications. This would be very valuable.

arbrandes · 2020-05-02T00:04:43Z

oeps/oep-0045-arch-ops-and-config.rst

+Jumping to Kubernetes
+*********************
+
+Kubernetes is an open source container orchestration platform pioneered by Google. While it often occupies the same conversation space as containers because it is a powerful way to manage them, it is a huge increase in complexity and expertise required to operate. For most installations Kubernetes is currently too much overhead/learning curve for the value. The edX organization may opt to explore deploying Docker containers this way in the future and would love to collaborate with operators who also decide to use Kubernetes to compare notes.


To hammer home the Blessed Deployment Project idea, the latter is where the Kubernetes discussion should happen. While I believe the intention to build such a project should live in this OEP, the specifics can be handled in a separate one. (Spoiler alert: Kubernetes would be an excellent contender for orchestrating the Open edX applications. :)

PS: @bderusha, we'll be discussing the reference deployment project idea during the next contributor's meetup, on Thursday, if you care to join us. :)

kdmccormick

Really excited for this. Other than my questions around required.py, I'm ready to approve.

OmarIthawi

Thank you. That's a welcomed move!

oeps/oep-0045-arch-ops-and-config.rst

bderusha · 2020-05-19T22:39:06Z

Thank you, everyone for the great discussion. I think we are ready to start the conversation around how we can tactically move from where we are today to this bold new world. I would like to note that there were a few important implementation details we should plan on addressing during the next phase of discussion:

required.py and/or how we can ensure clear communications to operators about what is required and fail fast if not provided.
guidance/paths/helpers for migrating between releases
whether private.py would be useful for local dev or whether more complicated overrides would be better served as supported in the main settings path.

Based on other discussions we have deemed including a reference implementation out of scope for this OEP. That does not preclude it from existing or being maintained by edX or the community. Just simply not something we felt was necessary when managing edx IDAs as containers.

kdmccormick

🎸

felipemontoya

@bderusha proposes we move this to provisional and merge.
Considering that this PR got a lot of attention, significant discussion and that most feedback is positive and doubts lean towards clarity on the implementation. I believe moving to provisional and merging is a good next step.

arbrandes · 2020-05-28T10:24:11Z

@bderusha, @felipemontoya: for the record, at OpenCraft we're still concerned about the absence of wording in this OEP regarding a reference deployment implementation, and would welcome further discussion on it before the merge. I created a thread for the purpose in the architecture forum.

pablodz · 2020-06-01T16:53:45Z

Yes, please we need new strategy of deployment.

…and update to provisional

arbrandes · 2020-06-05T18:38:58Z

oeps/oep-0045-arch-ops-and-config.rst

@@ -44,6 +44,11 @@ Wherever possible, operating concerns will live within the codebase they are con
 * An operations manual documenting standard operations required to run and maintain the IDA.
 * A changelog to document differences in settings, operating concerns, and application functionality from release to release.

+As code is moved to this new paradigm it is important to note that we strive to maintain or improve upon the state of operations without sacrificing features or visibility.  If operations code is removed from one place it must either have a home in some other public place better aligned with this OEP (EG package installation in Dockerfiles) or no longer be needed at all in this model (EG `conditional ansible commands`_ for installing packages on particular operating
+systems).


Sounds good to me! Thanks for this! 👍

nasthagiri · 2020-07-09T16:33:54Z

@felipemontoya @bderusha As I was looking to reference this OEP in another communication thread, I noticed that this OEP is yet not available on Read-the-Docs since it hasn't been merged. Are we now at a place where we can merge this particular PR in its provisional status - knowing fully well that another implementation-specific PR is pending?

felipemontoya · 2020-07-09T22:25:46Z

I think we are ready to merge this and continue with the implementation specific discussion.

OEP-45 Operating and Configuring Open edX

013008f

timmc-edx reviewed Apr 9, 2020

View reviewed changes

oeps/oep-0045-arch-ops-and-config.rst Outdated Show resolved Hide resolved

timmc-edx reviewed Apr 9, 2020

View reviewed changes

oeps/oep-0045-arch-ops-and-config.rst Outdated Show resolved Hide resolved

morenol reviewed Apr 10, 2020

View reviewed changes

oeps/oep-0045-arch-ops-and-config.rst Show resolved Hide resolved

felipemontoya reviewed Apr 13, 2020

View reviewed changes

oeps/oep-0045-arch-ops-and-config.rst Outdated Show resolved Hide resolved

regisb reviewed Apr 14, 2020

View reviewed changes

oeps/oep-0045-arch-ops-and-config.rst Show resolved Hide resolved

address initial feedback

f443ac1

felipemontoya approved these changes Apr 16, 2020

View reviewed changes

transition OEP-45 to Under Review

8da9d38

bderusha force-pushed the bderusha/OEP-45-operations branch from 2cb8629 to 8da9d38 Compare April 17, 2020 18:41

bderusha mentioned this pull request Apr 17, 2020

Follow-on Implementation Strategy for OEP-45 #144

Closed

coryleeio reviewed Apr 17, 2020

View reviewed changes

regisb reviewed Apr 17, 2020

View reviewed changes

oeps/oep-0045-arch-ops-and-config.rst Outdated Show resolved Hide resolved

regisb reviewed Apr 20, 2020

View reviewed changes

oeps/oep-0045-arch-ops-and-config.rst Show resolved Hide resolved

regisb reviewed Apr 20, 2020

View reviewed changes

bderusha added 4 commits April 27, 2020 17:33

Clean up formatting

40f93cf

Add container tooling sections and dockerfile requirements

2fa7b53

reword required.py section

1dc68bf

fix typo

f8f93ed

iloveagent57 reviewed Apr 28, 2020

View reviewed changes

oeps/oep-0045-arch-ops-and-config.rst Outdated Show resolved Hide resolved

oeps/oep-0045-arch-ops-and-config.rst Outdated Show resolved Hide resolved

oeps/oep-0045-arch-ops-and-config.rst Outdated Show resolved Hide resolved

adamstankiewicz reviewed Apr 28, 2020

View reviewed changes

oeps/oep-0045-arch-ops-and-config.rst Outdated Show resolved Hide resolved

oeps/oep-0045-arch-ops-and-config.rst Outdated Show resolved Hide resolved

oeps/oep-0045-arch-ops-and-config.rst Outdated Show resolved Hide resolved

Fix NITs

54cd957

nedbat reviewed Apr 28, 2020

View reviewed changes

oeps/oep-0045-arch-ops-and-config.rst Outdated Show resolved Hide resolved

nedbat reviewed Apr 28, 2020

View reviewed changes

oeps/oep-0045-arch-ops-and-config.rst Outdated Show resolved Hide resolved

nedbat reviewed Apr 28, 2020

View reviewed changes

oeps/oep-0045-arch-ops-and-config.rst Outdated Show resolved Hide resolved

nedbat reviewed Apr 28, 2020

View reviewed changes

oeps/oep-0045-arch-ops-and-config.rst Outdated Show resolved Hide resolved

stvstnfrd reviewed Apr 28, 2020

View reviewed changes

nedbat mentioned this pull request Apr 28, 2020

An initial draft of a decision around how we manage settings files. openedx/edx-platform#23829

Merged

Fix formatting

c195459

nasthagiri reviewed Apr 28, 2020

View reviewed changes

e0d reviewed Apr 29, 2020

View reviewed changes

oeps/oep-0045-arch-ops-and-config.rst Show resolved Hide resolved

staubina approved these changes Apr 29, 2020

View reviewed changes

feanil reviewed Apr 30, 2020

View reviewed changes

giovannicimolin reviewed Apr 30, 2020

View reviewed changes

Updates based on feedback

095a03c

arbrandes reviewed May 2, 2020

View reviewed changes

kdmccormick reviewed May 4, 2020

View reviewed changes

OmarIthawi approved these changes May 4, 2020

View reviewed changes

davidjoy reviewed May 8, 2020

View reviewed changes

oeps/oep-0045-arch-ops-and-config.rst Outdated Show resolved Hide resolved

kdmccormick approved these changes May 20, 2020

View reviewed changes

felipemontoya approved these changes May 28, 2020

View reviewed changes

Add sentence ensuring we dont lose functionality with future changes …

5a8045c

…and update to provisional

arbrandes approved these changes Jun 5, 2020

View reviewed changes

nasthagiri approved these changes Jul 9, 2020

View reviewed changes

jinder1s mentioned this pull request Jul 17, 2020

Create function to set a Django setting openedx/edx-platform#23762

Closed

bderusha merged commit 4a75ead into master Jul 23, 2020

bderusha deleted the bderusha/OEP-45-operations branch July 23, 2020 15:31

kdmccormick mentioned this pull request May 17, 2024

Let's talk about OEP-45 (Configuring & Operating Open edX) #587

Open


		``required.py`` variables will all be initialized to ``None`` and the application will not start unless they are set. This allows operators to fail fast rather than finding out about an unset value when users exercise those breaking codepaths. Application developers are encouraged to keep the list of required settings to a minimum.

		This new settings structure obviates the need for any other python files in the settings directory (such as ``devstack.py``, ``test.py``, etc). The values currently set in those files should be moved to a corresponding ``devstack.yml``, ``test.yml``, etc in the same settings directory. This gives developers and operators more consistency across environments since the same code paths are being executed with different values.


		``required.py`` variables must be overridden by operators. The application will check that operators provided these values, and will not start unless they are set. This allows operators to fail fast rather than finding out about an unset value when users exercise those breaking codepaths. Application developers are encouraged to keep the list of required settings to a minimum.

		This new settings structure obviates the need for any other python files in the settings directory (such as ``devstack.py``, ``test.py``, etc). The values currently set in those files should be moved to a corresponding ``devstack.yml``, ``test.yml``, etc in the same settings directory. This gives developers and operators more consistency across environments since the same code paths are being executed with different values.


		Config file generation & management

		Due to the varied needs and processes of different operators, how the config files are created, managed, or otherwise end up on the server is up to the operator and will depend greatly on their deployment strategy.


		Docker Images

		edX will provide Docker images for applications that captures the latest code on the master branch as well as images representing named releases. edX will not provide these images for named releases prior to the acceptance and implementation of this OEP which is Juniper at time of writing.


		The settings defined in ``required.py`` and ``defaults.py`` files are mutually exclusive, representing all application specific settings as well as installed library settings whose values either must be provided or whose defaults are not considered production-ready.

		``required.py`` variables must be overridden by operators. The application will check that operators provided these values, and will not start unless they are set. This allows operators to fail fast rather than finding out about an unset value when users exercise those breaking codepaths. Application developers are encouraged to keep the list of required settings to a minimum.


		.. _Sphinx autodoc: https://www.sphinx-doc.org/en/master/usage/extensions/autodoc.html#directive-autoattribute

		Operations Manuals


		Installing and running an Open edX instance is not simple. We strongly recommend that you use a service provider to run the software for you.

		The proposal below outlines how we can create a cleaner, more intuitive interface for operating Open edX and in doing so help both edX.org and the Open edX community at large achieve better outcome at a faster pace. In brief it is:

		edX will provide Docker images for applications that captures the latest code on the master branch as well as images representing named releases. edX will not provide these images for named releases prior to the acceptance and implementation of this OEP (Aspen through and including Juniper at time of writing).


		Configuration


		A clear manual of operations will exist in the form of RST files in an ``operations`` directory within the ``documentation`` directory for that application. See `this commit`_ for an example provided by the Open edX Build-Test-Release working group. The operations docs will cover common operations such as how to run the application for web traffic or as an async worker and how to manage the application's underlying database schema. It will also include a list of potential maintenance tasks operators may want to leverage such as clearing sessions or applying security patches. Finally it will include the list of ad-hoc management commands operators can use to help handle edge case or one-time operations.

		In the same vein as not dictating how operators create and manage their application config files, operators will also be expected to manage how they execute the operations documented in the manual.

OEP-45 Operating and Configuring Open edX #143

OEP-45 Operating and Configuring Open edX #143

Conversation

bderusha commented Apr 8, 2020 • edited Loading

bradenmacdonald commented Apr 9, 2020

bderusha commented Apr 16, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Zebulon release notes:

Choose a reason for hiding this comment

regisb Apr 20, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

iloveagent57 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bderusha Apr 28, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stvstnfrd left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nasthagiri left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

staubina left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kdmccormick May 4, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

giovannicimolin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

arbrandes left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kdmccormick left a comment

Choose a reason for hiding this comment

OmarIthawi left a comment

Choose a reason for hiding this comment

bderusha commented May 19, 2020

kdmccormick left a comment

Choose a reason for hiding this comment

felipemontoya left a comment

Choose a reason for hiding this comment

bderusha commented Apr 8, 2020 •

edited

Loading

regisb Apr 20, 2020 •

edited

Loading

bderusha Apr 28, 2020 •

edited

Loading

kdmccormick May 4, 2020 •

edited

Loading

arbrandes commented May 28, 2020 •

edited

Loading