-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OEP-45 Operating and Configuring Open edX #143
Conversation
@bderusha This sounds great, but will be a big change! Could you perhaps talk more about the rollout strategy for this? As we've mentioned before, a partial change to this approach such as removing the config files from |
@bradenmacdonald The implementation strategy for this will be extremely important. I would like to have separate conversations about whether we think this is a good direction for the project and how we go about getting there. As such, I have created a follow-up PR where we can have that tactical discussion and eventually merge the results back into this OEP. |
2cb8629
to
8da9d38
Compare
* A Dockerfile from which the container image can be built. | ||
* A documented settings file with production-ready defaults. | ||
* A documented settings file with all of the configuration values that must be defined by the operator. | ||
* An operations manual documenting standard operations required to run and maintain the application. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this include / should this include a rolling changelog for changes that will affect operators?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a changelog is a good idea. I don't know the value of an operations specific log VS a single log to capture all relevant changes to the codebase. Interested to hear other's thoughts here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changelogs are often categorized. Perhaps "Operations" could be a category along with "Breaking changes", "Bug fixes", etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Stanford team found the changelog in the configuration
repo to be helpful during upgrades, so +1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "django way" is to have both a changelog (release notes) and a manual that indicates differences with the previous version.
For instance, in the manual:
Run
foobarcocktail --no-dry-run
to update the foogarizer. This feature was introduced in release Zebulon
And then in the changelog:
Zebulon release notes:
[Operations][Breaking change] In order to update the foogarizer the
--no-dry-run
option must be added to thefoobarcocktail
command.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As with my previous comments, the most important type of changelog I can imagine is the one that deals with required configuration changes. This affects operators directly. If no other changelogs are explicitly named, I'd vote for this one.
|
||
``required.py`` variables will all be initialized to ``None`` and the application will not start unless they are set. This allows operators to fail fast rather than finding out about an unset value when users exercise those breaking codepaths. Application developers are encouraged to keep the list of required settings to a minimum. | ||
|
||
This new settings structure obviates the need for any other python files in the settings directory (such as ``devstack.py``, ``test.py``, etc). The values currently set in those files should be moved to a corresponding ``devstack.yml``, ``test.yml``, etc in the same settings directory. This gives developers and operators more consistency across environments since the same code paths are being executed with different values. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we talking about using the Docker images for local development? I think we should: or at least, we should be able to test the Docker images locally by pointing to localhost (and not S3, for instance).
If we are to use these Docker images for local development, I'm afraid this is an overly optimistic approach for organising the settings. I don't think we will be able to reach this state where we only have defaults.py/required.py without having to resort to huge prod.yml/dev.yml files.
My point is that required.py
and defaults.py
, by themselves, will not be usable in either a development or production environment without very large prod.yml/dev.yml files. The reason for that is that prod.py/dev.py contain not only configuration values, but also configuration logic.
Let's take a very simple example: in production we need to set DEBUG = False
. This is just a fact that is true for all users. Does that mean that all users should now add DEBUG: false
to their yaml files? I don't think it should: there are quite literally hundreds of settings that need to be set separately for production and development, but for which we can easily predict sane defaults.
The alternatives are:
- Use a single settings file that differentiates between dev and prod environments. This is the approach used by Sentry. See: https://github.com/getsentry/sentry/blob/685427a5dea1e1ad0adb1816256e28f4958f570f/src/sentry/conf/server.py#L68
- Use separate settings files: base.py, production.py, development.py. This is advocated by Two Scoops of Django and it's Jacob Kaplan-Moss' "The One True Way".
In the case of edx-platform, I would argue for approach 2.
We would have, similar to what we currently do:
base.py:
# Reasonable settings
additional_settings = yaml.load(os.environ["CONFIG_PATH"]) if "CONFIG_PATH" in os.environ else {}
DEBUG = additional_settings.get("DEBUG", False)
ALLOWED_HOSTS = additional_settings.get("ALLOWED_HOSTS", ["localhost", "localhost:8000"])
...
development.py:
from .base import *
# Development logic
DEBUG = True
ALLOWED_HOSTS = ["*"]
SECRET_KEY = "v3rys3cr3t"
production.py:
from .base import *
# Production logic
DEBUG = False
ALLOWED_HOSTS = [additional_settings["HOST"]]
SECRET_KEY = additional_settings["SECRET_KEY"] # fail fast in case this is not defined
...
I understand that, by doing so, we lose the benefit of automatically loading the settings from init.py, but in my experience this is not a problem: settings will be automatically loaded as soon as we set the DJANGO_SETTINGS_MODULE environment variable (which should be set to "prod" by default in the Docker image)
Furthermore, users will still be able to customize production/development settings by creating their own settings files which will import from production.py/development.py (as we do today). This is pretty much the standard way to proceed in the django world.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of the goals of this OEP is to have an application that executes the same codepaths in development as in production wherever reasonable. I believe that this is a place that could reasonably be one code path. Let's take the dev defaults case, because it's a very valid point. Here is how I would see that working.
required.py
# Variables required to run, but without reasonable production defaults
LMS_BASE_URL = None
JWT_AUTH = None
...
defaults.py
# Variable with reasonable defaults
DEBUG = false
DEFAULT_PAGE_SIZE = 25
STATIC_URL = '/static/'
...
init.py
import os
from .required import *
from .defaults import *
def get_env_setting(setting):
""" Get the environment setting or return exception """
try:
return os.environ[setting]
except KeyError:
error_msg = u"Set the %s env variable" % setting
raise ImproperlyConfigured(error_msg)
CONFIG_FILE = get_env_setting('APP_CFG')
with open(CONFIG_FILE, encoding='utf-8') as f:
config_from_yaml = yaml.safe_load(f)
vars().update(config_from_yaml)
development.yml (lives in the repo either like this or as starting yaml to copy development.yml.example
)
# required
LMS_BASE_URL: localhost:18000
JWT_AUTH: { ... auth ... }
# default overrides
DEBUG: true
DEFAULT_PAGE_SIZE: 1000
production.yml (managed securely elsewhere)
# required
LMS_BASE_URL: www.my-prod-lms.com
JWT_AUTH: { ... prod auth ... }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am all for giving developers and operators as many of the values defaulted to suit their environments as possible. It is also likely that for testing we will need/want a test.yml
to override certain settings. That should also live in the repo.
IMO this method increases the likelihood that new settings end up in the right place, documented, with proper defaults. And that when people are writing and testing their apps they aren't surprised by how it works in production.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like it.
|
||
``required.py`` variables must be overridden by operators. The application will check that operators provided these values, and will not start unless they are set. This allows operators to fail fast rather than finding out about an unset value when users exercise those breaking codepaths. Application developers are encouraged to keep the list of required settings to a minimum. | ||
|
||
This new settings structure obviates the need for any other python files in the settings directory (such as ``devstack.py``, ``test.py``, etc). The values currently set in those files should be moved to a corresponding ``devstack.yml``, ``test.yml``, etc in the same settings directory. This gives developers and operators more consistency across environments since the same code paths are being executed with different values. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Developers have found "private.py" helpful. That will also go away?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at edx-platform/lms/envs/test.py, there is a little bit of executable code in it, for example to randomize keys. How will that be handled in a test.yml world?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. However we should talk about how to replicate it's usefulness in a yml
config world.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, I took a look back at how we handled this generally (custom logic in settings .py
files) on the Stanford fork...
We used to have some crazier things (heck, we shadowed the entire {lms/cms}/envs/
directory structure), but here's what we were actually using in practice, at the end:
- Contextually removing some settings (
INSTALLED_APPS
,MIDDLEWARE_CLASSES
) [1] - expanding wildcard paths [2]
- overriding logging config [3]
Re: private.py
specifically:
We definitely used this and I'd worry that folks won't be able to replicate all of the capabilities of custom Python logic w/in a .yml
file. Is there a specific reason we're looking to remove this feature, eg, "private.py
considered harmful"?
Or is this more "getting caught up in the storm" of cleaning out the settings directory?
Since private.py
is explicitly git-ignored
, it shouldn't increase the number of files checked in to that directory, with minimal maintenance overhead on the implementation [4].
I'm open to alternatives, as long as we don't have to reinvent the wheel.
- [1] https://github.com/Stanford-Online/edx-platform/blob/master/openedx/stanford/cms/envs/test.py#L8-L11
- [2] https://github.com/Stanford-Online/edx-platform/blob/master/openedx/stanford/cms/envs/common.py#L67-L69
- [3] https://github.com/Stanford-Online/edx-platform/blob/master/openedx/stanford/lms/envs/devstack.py#L9-L20
- [4] https://github.com/Stanford-Online/edx-platform/blob/master/lms/envs/devstack.py#L273-L275
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nedbat I just looked through that file and I didn't see any executable code that couldn't just be a preset value. I don't think we will be able to get away with just a straight import and override from the yaml, but I do think that we can keep any executable code required to set up the app in one code path.
If people have valid cases for executable code in the settings files that only apply to a particular environment I would love to see those cases. I wasn't able to find any myself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stvstnfrd thanks for all those great references! I will copy a line written in previous comment
One of the goals of this OEP is to have an application that executes the same code paths in development as in production wherever reasonable. I believe that this is a place that could reasonably be one code path.
I would love for that to be our starting assumption and see if there is a way to achieve the needs of operators and developers without compromising current functionality (though we may need to change current processes/mindsets). Your examples bring up a great and very common configuration challenge. INSTALLED_APPS
and MIDDLEWARE_CLASSES
Would it be reasonable for all configuration files to be able to define:
INSTALLED_APPS:
- 'app1'
- 'app2'
APPEND_INSTALLED_APPS:
- 'app3'
REMOVE_INSTALLED_APPS:
- 'app2'
INSTALLED_APPS
would completely overwrite the value. APPEND
and REMOVE
would take the corresponding list of value and perform that action. Allowing the code path to resolve these common cases the same way across the board.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bderusha Cool, yeah, I probably should have split apart the two lines of thought in that comment (private.py
vs general config), as I am implying the former could be be kept around w/ negligible overhead/maintenance, but not implying as much with the latter.
For the general config stuff, I didn't mean to imply "this can't be done!"; just wanted to flag some examples that may not yet be accounted for :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bderusha Your INSTALLED_APPS
example does a good job of highlighting how handling some Django settings will be more complex than just "fully override it with the value in the YAML". I think edx-platform OPTIONAL_APPS also illustrates this.
I think it would be worth eventually specifying our "special" Django settings (APPEND_INSTALLED_APPS
, REMOVE_MIDDLEWARE
, etc.) somewhere visible, with a reference implementation of how to handle them in __init__.py
. That being said, I think it's out of the scope of this PR, as it would require some digging and discovery to find out what those special settings should be and how we would handle them. Maybe that's something that would end up in your follow-up PR.
|
||
**Config file generation & management** | ||
|
||
Due to the varied needs and processes of different operators, how the config files are created, managed, or otherwise end up on the server is up to the operator and will depend greatly on their deployment strategy. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will we be generating test.yml?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While the deployment processes will often vary, does (or should) the creation vary?
If we're forcing certain values to be set by operators (and enforcing via early failure), it seems like we ought to provide a canonical way to create the files, or at least provide a blessed reference implementation.
The actual implementation would vary, but I'm imaging at least a helper of sorts, something in the spirit of:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nedbat I think that applications should define a test.yml
that can be used, I would not expect to need to generate them.
@stvstnfrd I don't have a strong opinion here. My thought was that it is so simple that no reference or helper would be required. Value defined in required.py
or default.py
maps 1-to-1 to the value you stick in your yml file to override it. Interested to hear more about what the helper would actually do
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bderusha I guess I don't know either :) I'll probably have a better idea upon seeing potential implementations.
And no, I'm not expecting that at this point, just slightly pushing back on the idea of deferring the responsibility permanently :)
My ears perk up whenever I hear, "operators can figure this out themselves", as those were often the areas that bit us most frequently :P
It's probably less of an issue for us than it will be for the community.
- I assume we'll generally make the change to
{required,default}.py
immediately in tandem with the change to the${env}.yml
(whether that's in a private repo or what not). - On the other hand, community operators are going to "inherit" the changes to
{required,default}.py
weeks/months/years later, and then will need to reconcile the two files, line by line (?? since it won't be a simplediff
, since we're comparingpy
andyml
files) or while auditing theCHANGELOG
.- Of course, it won't be the "worst thing ever" ™️ , but it seems like an extra tedium to incur upon operators, that we won't feel ourselves.
Off the top of my head (again, I'm sure seeing sample implementations will help my mental model), I could imagine a script that auto-merges/reconciles the py
files with the yml
. Something like:
./upgrade-config --input ironwood.yml --output juniper.yml
# defaults.py and required.py are implied default inputs
So assuming input files like:
## required.py
ADMIN_EMAIL = YAML_CONFIG['ADMIN_EMAIL']
NEW_TOGGLE_ADDED_IN_JUNIPER = YAML_CONFIG['NEW_TOGGLE_ADDED_IN_JUNIPER']
## defaults.py
LMS_BASE = YAML_CONFIG['LMS_BASE']
ANOTHER_TOGGLE_ADDED_IN_JUNIPER = YAML_CONFIG['ANOTHER_TOGGLE_ADDED_IN_JUNIPER']
## ironwood.yml
ADMIN_EMAIL: 'admin@example.com'
LMS_BASE: 'localhost:18000'
OLD_TOGGLE_REMOVED_IN_JUNIPER: true
Might yield an output file like:
## juniper.yml
## This file was automatically generated by: `upgrade-config`
## Deprecated settings
## The following are now unsupported in this release:
# OLD_TOGGLE_REMOVED_IN_JUNIPER: true
## Required settings
## These must all be set:
ADMIN_EMAIL: 'admin@example.com'
# NEW_TOGGLE_ADDED_IN_JUNIPER: 'recommended default value'
## Default overrides
## These are optional overrides:
LMS_BASE: 'localhost:18000'
# ANOTHER_TOGGLE_ADDED_IN_JUNIPER: 'default to the value in `defaults.py`'
An operator would still need to tweak the output file (to update new required values), but I can imagine some process like this making it easier and more obvious to automatically see:
- which new required settings have been added
- which new optional overrides are available
- which settings have been deprecated and are no longer supported
I'm sure there would be deployments where this isn't as useful, but I'd expect it to remove busy work and mistakes for many operators.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with @stvstnfrd last point sounds very reasonable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for the automated way to see which settings have been added, removed, and/or deprecated.
As for "letting the operator decide", my ears also perk up. Further above, I suggest the possibility of implementing exactly such a "blessed" deployment strategy, in the spirit of OpenStack Kolla.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here are some thoughts from the ol' Stanford perspective :)
* A Dockerfile from which the container image can be built. | ||
* A documented settings file with production-ready defaults. | ||
* A documented settings file with all of the configuration values that must be defined by the operator. | ||
* An operations manual documenting standard operations required to run and maintain the application. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Stanford team found the changelog in the configuration
repo to be helpful during upgrades, so +1
|
||
**Docker Images** | ||
|
||
edX will provide Docker images for applications that captures the latest code on the master branch as well as images representing named releases. edX will not provide these images for named releases prior to the acceptance and implementation of this OEP which is Juniper at time of writing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if it's covered elsewhere, but just to call out:
While default docker images will cover the majority of cases, we should be sure it's easy (enough) to both build custom images and then implement them. The Stanford team was slowed down getting our custom containers up and running in the docker-based devstack.
|
||
``required.py`` variables must be overridden by operators. The application will check that operators provided these values, and will not start unless they are set. This allows operators to fail fast rather than finding out about an unset value when users exercise those breaking codepaths. Application developers are encouraged to keep the list of required settings to a minimum. | ||
|
||
This new settings structure obviates the need for any other python files in the settings directory (such as ``devstack.py``, ``test.py``, etc). The values currently set in those files should be moved to a corresponding ``devstack.yml``, ``test.yml``, etc in the same settings directory. This gives developers and operators more consistency across environments since the same code paths are being executed with different values. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, I took a look back at how we handled this generally (custom logic in settings .py
files) on the Stanford fork...
We used to have some crazier things (heck, we shadowed the entire {lms/cms}/envs/
directory structure), but here's what we were actually using in practice, at the end:
- Contextually removing some settings (
INSTALLED_APPS
,MIDDLEWARE_CLASSES
) [1] - expanding wildcard paths [2]
- overriding logging config [3]
Re: private.py
specifically:
We definitely used this and I'd worry that folks won't be able to replicate all of the capabilities of custom Python logic w/in a .yml
file. Is there a specific reason we're looking to remove this feature, eg, "private.py
considered harmful"?
Or is this more "getting caught up in the storm" of cleaning out the settings directory?
Since private.py
is explicitly git-ignored
, it shouldn't increase the number of files checked in to that directory, with minimal maintenance overhead on the implementation [4].
I'm open to alternatives, as long as we don't have to reinvent the wheel.
- [1] https://github.com/Stanford-Online/edx-platform/blob/master/openedx/stanford/cms/envs/test.py#L8-L11
- [2] https://github.com/Stanford-Online/edx-platform/blob/master/openedx/stanford/cms/envs/common.py#L67-L69
- [3] https://github.com/Stanford-Online/edx-platform/blob/master/openedx/stanford/lms/envs/devstack.py#L9-L20
- [4] https://github.com/Stanford-Online/edx-platform/blob/master/lms/envs/devstack.py#L273-L275
|
||
**Config file generation & management** | ||
|
||
Due to the varied needs and processes of different operators, how the config files are created, managed, or otherwise end up on the server is up to the operator and will depend greatly on their deployment strategy. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While the deployment processes will often vary, does (or should) the creation vary?
If we're forcing certain values to be set by operators (and enforcing via early failure), it seems like we ought to provide a canonical way to create the files, or at least provide a blessed reference implementation.
The actual implementation would vary, but I'm imaging at least a helper of sorts, something in the spirit of:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent and thorough design proposal! This will do wonders for reducing our complexity. Thank you!
Containers | ||
********** | ||
|
||
**Why Containers** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of this content isn't universally accepted to be true, for example that containers are more secure. I'm not sure it's required to make this case here. Containerization feels like a different OEP to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's true that containers might not improve security across the board, but they certainly help to address some security problems. For instance: it's easier to limit the exposition of service ports to the outside world. Also, applications are better isolated from one another.
I feel like it's important to advocate the use of containers in this OEP, as its the primary alternative for non-edX users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am excited about this approach. I look forward to discussing how we make this happen and the plan forward.
|
||
The settings defined in ``required.py`` and ``defaults.py`` files are mutually exclusive, representing all application specific settings as well as installed library settings whose values either must be provided or whose defaults are not considered production-ready. | ||
|
||
``required.py`` variables must be overridden by operators. The application will check that operators provided these values, and will not start unless they are set. This allows operators to fail fast rather than finding out about an unset value when users exercise those breaking codepaths. Application developers are encouraged to keep the list of required settings to a minimum. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it too specific to say we'll use the ImproperlyConfigured exception?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it worth specifying how how the service will check if all of its required settings have been overriden? Or is that up to the service?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the how is up for debate and can be moved to the implementation section of this OEP
|
||
The settings defined in ``required.py`` and ``defaults.py`` files are mutually exclusive, representing all application specific settings as well as installed library settings whose values either must be provided or whose defaults are not considered production-ready. | ||
|
||
``required.py`` variables must be overridden by operators. The application will check that operators provided these values, and will not start unless they are set. This allows operators to fail fast rather than finding out about an unset value when users exercise those breaking codepaths. Application developers are encouraged to keep the list of required settings to a minimum. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're we're overriding the defaults from the config file, then I think you won't actually need to do any assignment in the required.py
it will only have the validation checks. From what I can tell about sphinx autodoc, it looks like it's setup to pickup assignments, so you may end up in a situation where you have to do an assignment for the docs to pick it up even though it won't be necessary for the code to work correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not clear to me that required.py
will do any validation, which would mean that it exists entirely for documentation. I'm not sure how I feel about that. Could we instead just put required settings in defaults.py
, and used Sphinx annotations to indicate that they're required?
If I'm wrong, and required.py
will in fact do validation, then I recommend renaming required.py
to validate_required.py
, and clarifying that it will be imported after the YAML is loaded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because the implementation details of this part matter quite a bit to the final design, I think I would like to move this conversation over to the "how we get there" part. We should be open to amending this section as we coalesce around exactly what operating pattern needs to exist to realistically land somewhere where we have 1 codepath for configuration and clear delineations for operators around what they must provide to run the app.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This proposal, looks great! And I'm looking forward to see these ideas implemented.
I've left a few comments in the document.
|
||
.. _Sphinx autodoc: https://www.sphinx-doc.org/en/master/usage/extensions/autodoc.html#directive-autoattribute | ||
|
||
Operations Manuals |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The edx/configuration
repo also handles setting up requirements between LMS/Studio and other IDAs, such as creating OAuth2 credentials/accouts (oauth-client-setup).
Since this OEP's goal is to replace configuration
, should we add instructions in the operations manuals about what needs to be done in order to configure each IDA.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To continue with the "blessed reference deployment project" idea, this would be exactly one of its mandates: configuration of the interaction between the different services.
|
||
Installing and running an Open edX instance is not simple. We strongly recommend that you use a service provider to run the software for you. | ||
|
||
The proposal below outlines how we can create a cleaner, more intuitive interface for operating Open edX and in doing so help both edX.org and the Open edX community at large achieve better outcome at a faster pace. In brief it is: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This OEP does a really good job describing how configuration should be done on each app and sets great standards for building images, but it doesn't cover the "how to put things together" like the current configuration
repo implicitly does.
For example, when running openedx_native.yml
, the playbooks set up, on a single instance, all the services needed to run a full Open edX deployment:
- LMS
- Studio
- Workers
- IDAs
- Other support services
- Handles setting up LMS/Studio and wiring between them and the IDAs
I know that this OEP doesn't cover deployment
because there are a lot of different ways to do that, but the requirements for a Open edX deployment are not described anywhere (except implicitly in the configuration
repo).
So the configuration
repo isn't the best reference, but it's currently all we have for deploying from scratch.
Can we add a section on this OEP (or create a follow up OEP), to better document and map how the platform is supposed to work? What is each piece, and what's required for a "basic" installation if starting from scratch using the docker images?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a very good point. Perhaps Open edX can take inspiration from OpenStack Kolla. In spirit, it is akin to what are referred to as applications in this OEP (a "project", in OpenStack parlance), is officially supported by the parent organization, and deals exactly with the question outlined by @giovannicimolin: a reference production deployment. If people end up using it or not is less relevant than the fact it is an officially supported way to get a working production deployment, even if basic at first.
As far as this OEP is concerned, I believe it would be a good idea to refer to such a project as an official goal.
edX will provide Docker images for applications that captures the latest code on the master branch as well as images representing named releases. edX will not provide these images for named releases prior to the acceptance and implementation of this OEP (Aspen through and including Juniper at time of writing). | ||
|
||
|
||
Configuration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this new configuration strategy handles differentiating between a container running only workers to a container actually serving the LMS/Studio pages (gunicorn).
Will each repo have a separate dockerfile for each container "type" (celery worker, gunicorn, etc) or this is going to be managed through configuration?
I'm asking this because the workers and the LMS run off the same codebase with the same requirements, with just different entrypoints.
Is it worth adding a section here about this specific point?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, I think this is an excellent direction for Open edX to take. To distill my inline comments, I only feel like two things are missing:
- An explicit intention to document, either programatically or in text, how configuration requirements change from one release to the next, or even from one commit to the next.
- An explicit intention to build a reference deployment application in its own repository, maintained in step with all the other Open edX applications.
Wherever possible, operating concerns will live within the codebase they are concerned with. Thus it should be expected that all applications will provide the following: | ||
|
||
* A Dockerfile from which the container image can be built. | ||
* A documented settings file with production-ready defaults. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to see an explicit commitment that these defaults will be maintained in step with the rest of the application, preferably with an associated Changelog. Would that be possible?
|
||
* A Dockerfile from which the container image can be built. | ||
* A documented settings file with production-ready defaults. | ||
* A documented settings file with all of the configuration values that must be defined by the operator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As with the defaults, I would expect this file to be maintained in step with the application, with changes being documented in a Changelog. One of the pains with current edx-configuration
is the difficulty in figuring out what variables need to be reworded from one release to the next.
* A Dockerfile from which the container image can be built. | ||
* A documented settings file with production-ready defaults. | ||
* A documented settings file with all of the configuration values that must be defined by the operator. | ||
* An operations manual documenting standard operations required to run and maintain the application. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As with my previous comments, the most important type of changelog I can imagine is the one that deals with required configuration changes. This affects operators directly. If no other changelogs are explicitly named, I'd vote for this one.
|
||
Installing and running an Open edX instance is not simple. We strongly recommend that you use a service provider to run the software for you. | ||
|
||
The proposal below outlines how we can create a cleaner, more intuitive interface for operating Open edX and in doing so help both edX.org and the Open edX community at large achieve better outcome at a faster pace. In brief it is: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a very good point. Perhaps Open edX can take inspiration from OpenStack Kolla. In spirit, it is akin to what are referred to as applications in this OEP (a "project", in OpenStack parlance), is officially supported by the parent organization, and deals exactly with the question outlined by @giovannicimolin: a reference production deployment. If people end up using it or not is less relevant than the fact it is an officially supported way to get a working production deployment, even if basic at first.
As far as this OEP is concerned, I believe it would be a good idea to refer to such a project as an official goal.
|
||
**Config file generation & management** | ||
|
||
Due to the varied needs and processes of different operators, how the config files are created, managed, or otherwise end up on the server is up to the operator and will depend greatly on their deployment strategy. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for the automated way to see which settings have been added, removed, and/or deprecated.
As for "letting the operator decide", my ears also perk up. Further above, I suggest the possibility of implementing exactly such a "blessed" deployment strategy, in the spirit of OpenStack Kolla.
|
||
.. _Sphinx autodoc: https://www.sphinx-doc.org/en/master/usage/extensions/autodoc.html#directive-autoattribute | ||
|
||
Operations Manuals |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To continue with the "blessed reference deployment project" idea, this would be exactly one of its mandates: configuration of the interaction between the different services.
|
||
A clear manual of operations will exist in the form of RST files in an ``operations`` directory within the ``documentation`` directory for that application. See `this commit`_ for an example provided by the Open edX Build-Test-Release working group. The operations docs will cover common operations such as how to run the application for web traffic or as an async worker and how to manage the application's underlying database schema. It will also include a list of potential maintenance tasks operators may want to leverage such as clearing sessions or applying security patches. Finally it will include the list of ad-hoc management commands operators can use to help handle edge case or one-time operations. | ||
|
||
In the same vein as not dictating how operators create and manage their application config files, operators will also be expected to manage how they execute the operations documented in the manual. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree completely with avoiding hard-coded prescriptions. And documenting suggestions for operation is fabulous. But to get buy-in from the community, I once again suggest coming up with a section in this OEP devoted to the creation of a "blessed" reference deployment project with its own repository. It doesn't have to be turn-key, necessarily, but it can at least afford to be more opinionated than the Dockerfiles and settings files that live in the corresponding applications. This would be very valuable.
Jumping to Kubernetes | ||
********************* | ||
|
||
Kubernetes is an open source container orchestration platform pioneered by Google. While it often occupies the same conversation space as containers because it is a powerful way to manage them, it is a huge increase in complexity and expertise required to operate. For most installations Kubernetes is currently too much overhead/learning curve for the value. The edX organization may opt to explore deploying Docker containers this way in the future and would love to collaborate with operators who also decide to use Kubernetes to compare notes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To hammer home the Blessed Deployment Project idea, the latter is where the Kubernetes discussion should happen. While I believe the intention to build such a project should live in this OEP, the specifics can be handled in a separate one. (Spoiler alert: Kubernetes would be an excellent contender for orchestrating the Open edX applications. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PS: @bderusha, we'll be discussing the reference deployment project idea during the next contributor's meetup, on Thursday, if you care to join us. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really excited for this. Other than my questions around required.py
, I'm ready to approve.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you. That's a welcomed move!
Thank you, everyone for the great discussion. I think we are ready to start the conversation around how we can tactically move from where we are today to this bold new world. I would like to note that there were a few important implementation details we should plan on addressing during the next phase of discussion:
Based on other discussions we have deemed including a reference implementation out of scope for this OEP. That does not preclude it from existing or being maintained by edX or the community. Just simply not something we felt was necessary when managing edx IDAs as containers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎸
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bderusha proposes we move this to provisional
and merge.
Considering that this PR got a lot of attention, significant discussion and that most feedback is positive and doubts lean towards clarity on the implementation. I believe moving to provisional and merging is a good next step.
@bderusha, @felipemontoya: for the record, at OpenCraft we're still concerned about the absence of wording in this OEP regarding a reference deployment implementation, and would welcome further discussion on it before the merge. I created a thread for the purpose in the architecture forum. |
Yes, please we need new strategy of deployment. |
…and update to provisional
@@ -44,6 +44,11 @@ Wherever possible, operating concerns will live within the codebase they are con | |||
* An operations manual documenting standard operations required to run and maintain the IDA. | |||
* A changelog to document differences in settings, operating concerns, and application functionality from release to release. | |||
|
|||
As code is moved to this new paradigm it is important to note that we strive to maintain or improve upon the state of operations without sacrificing features or visibility. If operations code is removed from one place it must either have a home in some other public place better aligned with this OEP (EG package installation in Dockerfiles) or no longer be needed at all in this model (EG `conditional ansible commands`_ for installing packages on particular operating | |||
systems). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me! Thanks for this! 👍
@felipemontoya @bderusha As I was looking to reference this OEP in another communication thread, I noticed that this OEP is yet not available on Read-the-Docs since it hasn't been merged. Are we now at a place where we can merge this particular PR in its provisional status - knowing fully well that another implementation-specific PR is pending? |
I think we are ready to merge this and continue with the implementation specific discussion. |
This OEP is under review from 2020-04-17 though 2020-05-04.
The goal is to vet the proposed end state before we start talking about how to get there to make sure that the proposal properly outlines and answers major questions about how to operate and configure Open edX in this future state.
We are keeping this review period relatively short because, based on early feedback, we don't expect it to be a particularly controversial goal. Agreeing on this OEP quickly will allow us to move the conversation over to implementation strategies where we expect there to be more diverse needs to take into consideration. If it turns out to be more controversial than anticipated we will extend the review period to accommodate more feedback.
The implementation discussion will happen in this follow-on PR. While comments can be made at any time, I would ask that people refrain from any deep discussions there about how to achieve OEP-45 until we agree on what OEP-45 should be.