-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: ADR for an edx-platform representation of Catalog Courses. #34156
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,90 @@ | ||
Course Models | ||
############# | ||
|
||
Status | ||
****** | ||
|
||
Accepted | ||
feanil marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
Implementation pending | ||
|
||
Decision | ||
******** | ||
|
||
edx-platform will have a first-class model to represent the idea of a *catalog course*. | ||
If we think of multiple runs of the same course to all derive from one | ||
conceptual "course", the catalog course is what will represent that conceptual course. | ||
|
||
Context | ||
******* | ||
|
||
The term "course" in Open edX is ambiguous, referring to one of two things: | ||
|
||
1. An individual *run* of a course which is authored in Studio and published to LMS. Here, we call this the **CourseRun**. In edx-platform, it is modeled by the **CourseOverview**. | ||
2. A logical course *catalog* entry that may be marketed, sold, awarded as a credential, and bundled into programs. Here, we call this the **CatalogCourse**. | ||
|
||
Every one CatalogCourse maps to N CourseRuns. | ||
|
||
Historically, CatalogCourses have been inferred from the names of courses with the | ||
runs removed (eg. `course-v1:AximX/Eng101` could be a catalog course with multiple runs | ||
(`course-v1:AximX/Eng101:2024H1`, `course-v1:AximX/Eng1012024H2`, etc)). Having the catalog course explicitly defined as a new model has multiple benefits. | ||
runs removed (eg. "AximX/Eng101" could be a catalog course with multiple runs | ||
(2024H1, 2024H2, etc)). Having this explicitly defined as a new model has | ||
multiple benefits. | ||
|
||
* Relationships between course runs can be defined explicitly instead of implicitly. | ||
|
||
* Currently this information is either implied (via course key) or requested from | ||
course-discovery. In the case of edx-platform, it loads the entire course-discovery | ||
database into a cache. | ||
|
||
* A list of all courses can be easily queried both internally and via the rest API. | ||
|
||
* Catalog courses can be explicitly mapped to organizations rather than implicitly. | ||
|
||
* The door is opened to also persist program or course bundle data in the future. | ||
This will stop `programs-related features from breaking whenever memcached is reset <https://github.com/openedx/edx-platform/blob/master/openedx/core/djangoapps/catalog/docs/decisions/001-programs-cache.rst>`_. | ||
|
||
* Org → Course → Course Runs mappings don't have to follow implicit naming | ||
Conventions and associations can be changed more easily. | ||
|
||
Rejected Alternatives | ||
********************* | ||
|
||
Rely on course-discovery for CatalogCourse | ||
========================================== | ||
|
||
The course-discovery service already has some models to represent this and we | ||
could rely on it directly but it also contains a lot of representations that | ||
are specific to the edx.org process and it's not valuable to spin up the | ||
course-discovery service just to get this information for most operators. | ||
Comment on lines
+57
to
+60
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @robrap Can you elaborate on 4? I know of the the programs cache, which this will definitely help. My search just now also introduced me to the learner_skills_levels api, which this will not directly help since it relies on course-discovery's taxonomy API. In the long-term, though, taking course-discovery-querying utils out of edx-platform will have the benefit of discouraging new instances of the edx-platform-queries-course-discovery anti-pattern. Do you know of other instances? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Regarding 4, I haven't actually looked into the problem specifically. I've just seen its affect when we've had temporary issues on discovery that back up into the LMS. I probably need to look more closely when we have an actual incident. Thanks. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. FYI, I'm mostly not going to address this until we've arrived at a general agreement on what idea it would be useful to implement which we're still kind of working through. Just making a note that I haven't missed it but seems premature. |
||
|
||
As long as any local data can be updated from course-discovery if | ||
course-discovery is setup in an environment, we should not have any conflicts | ||
between the two locations. | ||
Comment on lines
+62
to
+64
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can we specify what will be the source of truth for both cases? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @DawoudSheraz my thought would be that this would be up to the operator. If the operator wants to use an external service as the source of truth, they would hook into the API and push changes into models in the LMS and could setup permissions so that data can't change from the LMS side. If the operator wants the LMS to be the source of truth, they can setup events and listen for changes to push data into their other systems. You could also do both to allow editing on either side and syncing in both directions. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
A similar type of model currently exists between Discovery-Studio. It has caused sync issues on edX side in the past. As for the source of truth, if it is up to the operator, we should be mentioning that explicitly in ADR. |
||
|
||
Essentially, if an operator has course-discovery enabled, the models here would | ||
be a persisted cache of the relevant information. However, if | ||
course-discovery is not enabled, the models can stand-alone and be populated | ||
either directly via APIs or automatically based on course naming conventions so | ||
that any features that rely on them can work in either system. | ||
|
||
|
||
Implementation Details | ||
********************** | ||
|
||
* Create a new CatalogCourse model | ||
|
||
* A 1:N mapping between 1 Catalog Course and N Course Run objects. | ||
|
||
* Open question: do we re-use CourseOverview to represent Course Run, or do we make a new CourseRun run model? | ||
|
||
* A 1:N mapping between 1 Organization and N Catalog Course objects. | ||
|
||
* A django admin that allows you to View and Manipulate Catalog Course objects and their associations. | ||
|
||
* A data migration to make all current implicit associations explicit. | ||
|
||
* Eventing hooks to auto-associate new runs to their implicit Catalog Course | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I thought you had mentioned a toggle around whether Discovery is being relied upon? In that case, is the expectation that events would instead be sent from Discovery to keep this data up-to-date? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not entirely sure. Course discovery is not currently required or enabled by default in Tutor. My expectation would be that this would remain true in the future but that the new models would get auto populated when course runs are created. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's a good question. I don't have a strong opinion on whether course-discovery should push, pull, or both. Mainly, I want to make sure that we treat course-discovery like any other external catalog system. So:
So, either way, we'll want both the API and the event hooks. |
||
|
||
* A REST api to CRUD Course Overview. Catalog Course, and Org data, along with their associations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ormsbee @feanil @robrap I've been mulling this ADR over for a while and I find myself wondering about the base assumption that edx-platform should have a Catalog Course model or that the LMS should care about Catalog Courses at all.
Hypothetically, if all LMS knew about were Course Runs, and we forbade it from ever looking how different runs could be grouped into Courses and Programs, what features would we lose? Would they be Core Product features (like credentialing and roles) or would they be things we've been discussing jettisoning out of the Core anyway (like masters degrees and upsell messaging)?
I know we want to enable seamless integration between Open edX and student-information/ecommerce/enterprise/marketing systems. But does having CatalogCourse and Program models in LMS help that, or might it actually make it more complex, since those systems might disagree about what a CatalogCourse means? Could it be better to let the external systems decide what "Course" and "Program" mean to them, and then use APIs to push metadata/enrollments/roles/whatever into edx-platform Course Runs as they need?
Getting more concrete: I'm not against having a CatalogCourse model, but before approving this ADR, I'd like to see a list of specific LMS/CMS features that would actually use such a model, and a bit in the Rejected Alternatives on why it's not sufficient for LMS/CMS to just think in Course Runs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A quick note to say I like your line of thought and questioning. The rest of this is very high-level thoughts.
I feel like the transition from edX to 2U has aided my transition from a worldview of the LMS being the center of the solar system, to its role as a planet orbiting with other planets and making up a different solar system for each operating organization.
At 2U, our catalog is much larger than the set of LMS course runs. What does that mean at the level of an LMS, and not necessarily the LMS, and how does this relate to this ADR? I know this may be more of a 2U issue, but it might inform how we think about the different pieces.
That said, if the Open edX platform always tries to provide an all-inclusive experience with as few parts as possible to make it simple for operators, we may always end up with very fuzzy boundaries. Or, if someone is able to dig into Kyle's request for specifics, maybe we can get to a cleaner solution that works for all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think @pdpinch hit the nail on the head with:
Course runs are already connected to each other in a bunch of ways. The Studio course listing page is eventually going to have the runs grouped by course. They often re-use 90+% of the same content, which is part of why I'd like to eventually see multiple runs of the same course stored in the same LearningPackage. That relationship is real, even if it's implicit today because it's only really held together by the CourseKey.
I think it's also valuable to be able to group Courses together into larger collections. There, we have use cases both for role/permissions (e.g. "this staff has broad admin or analytics reporting access to this set of courses"), as well as for catalog customization.
The global source of truth for this data doesn't need to be in edx-platform. For a large company, that source of truth might be Salesforce or any number of other things that eventually push to these primitives. But I think it's still valuable to have the base primitives defining these relationships in the platform.
I feel like the balancing act will be seeing just how thin we can make these core concepts, and how disciplined we can be about keeping other kinds of data separated (e.g. pricing).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To echo what Dave said, I think we have had some implicit version of this for a long time. We're gonna start using it more in the presentation. Right now this concept lives in course-discovery but that comes with a lot of baggage that we don't want. Being able to house a core concept in edx-platform and then push a lot of the rest into things that either plugin or otherwise interface with the core bits here seems like it would make the platform more generically usable.
Another thought, would we be better served by having a concept of a
CourseRun
that is independent of course content? Like have a course run model that we can populate without having to create any course content. Then anyone could make plugins that allow end-users to group the runs however they want with whatever metadata they want for advertising/marketing/selling. Can we achieve this by inverting the relationship between courseoverviews and modulestore?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think CourseOveviews is necessarily the right mechanism to do this, but I do broadly support the idea that provisioning the course/run and roles associated with them is upstream of content creation. I sketched out a primitive proposal for that here a while back.
The tricky part in that scenario would be figuring out the migration path and what happens to import/export of those fields that already exist in Modulestore. I think it's totally doable, but there are a lot of details to work out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
regarding defining a new CourseRun vs using the existing CourseOverview model...
Pros of using CourseOverview:
Cons of using CourseOverview:
.id
field (the PK) is the course key string, which is weird and probably not performant (?)modulestore.get_course()
anyway.If we make a new CourseRun field, it could be valuable to backfill
course_run = OneToOneField(CourseRun)
onto CourseOverview.After listing that all out, I think I've come around to the side of defining a new CourseRun model, assuming we are very protective about what goes into it.