Several enhancements and refactors to CourseGraph#29156
Conversation
e278844 to
2cd60ba
Compare
|
Your PR has finished running tests. The following contexts failed:
|
62e4680 to
ae6aeaf
Compare
9812f58 to
8b1a03a
Compare
8b1a03a to
021d93d
Compare
021d93d to
7cafb15
Compare
7cafb15 to
1bb074b
Compare
1a1eda4 to
15b3c47
Compare
435d7a1 to
6fd3467
Compare
| from cms.djangoapps.content.block_structure.signals import update_block_structure_on_course_publish | ||
| from openedx.core.djangoapps.content.block_structure.signals import update_block_structure_on_course_publish |
There was a problem hiding this comment.
Note: Fixes a mistake from the previous commit.
| self.setup_mock_graph( | ||
| mock_matcher_class, mock_graph_class | ||
| ) |
There was a problem hiding this comment.
Note: This enhances a test introduced in previous commit. With the changes to get_course_last_published in this commit, more real code paths are hit, so we need to set up a mock graph.
6fd3467 to
59991b4
Compare
|
@kdmccormick Change of plans: I'm going to merge this PR tomorrow (Tue 3/28). I just made a logging update to coursegraph and I want to see the results of that change in tonight's run before merging your PR. I will update you as I proceed. |
|
I just added a single additional commit, approved separately in this PR, that updates some CourseGraph documentation (no code changes). |
This code was originally located at: ./openedx/core/djangoapps/coursegraph However, code makes more sense within the ./cms tree, because: * it is responsible for publishing course content to an external system, with is within the responsibilities of CMS, and * is uses modulestore, which is discouraged for use in LMS (see 0011-limit-modulestore-use-in-lms.rst). So, we move the code to: ./cms/djangoapps/coursegraph and uninstall coursegraph from LMS. We do not expect this refactor to have any breaking downstream effects.
Move most docs out of docstring and into programatically- displayable argument help text. Also, the 'Example Usage' was out of date. This commit updates it to: * use `./manage.py cms ...' instead of `./manage.py lms ...', and * use `--port` instead of `--https_port`.
Introduce a new CMS settings COURSEGRAPH_CONNECTION, which allows operators to specify default connection paramters for a Neo4j instance. This has three purposes: * The `./manage.py cms dump_to_neo4j` management command will be much easier for developers and operators to type out because connection arguments can now be omitted. Note that connection arguments, if supplied, will override the arguments specified in CMS settings. * The automatic push-to-coursegraph-on-publish-signal introduced in subsequent commits can use these connection settings. * The CourseGraph Django admin actions introduced in subsequent commits can use these connection settings.
Previously, CourseGraph needed to be kept up-to-date by running `./manage.py dump_to_neo4j ...` manually or on a cron timer. This introduces a CMS new setting: COURSEGRAPH_DUMP_COURSE_ON_PUBLISH. When enabled, the CMS course_published signal handler will asynchronously dump each individual course to CourseGraph when it is published. This follows a pattern established by other subsystems like learning_sequences and special exam registration, both of which fire off asynchronous post-processing tasks from the course- publish handler.
The `get_course_last_published` function is used by CourseGraph to determine whether or not a course should be dumped to Neo4j. If the course hasn't been published since it was last dumped to Neo4j, then it can be skipped (unless the override_cache option is enabled). The function was previously built using the BlockStructure data model. While this worked fine in Production instances that enable `block_structure.storage_backing_for_cache`, this implementation did NOT work in development environments, which do not use the BlockStrcture model. Instead, we switch to using CourseOverview.modified to approximate when a course was last published. This is method has fewer moving parts and is universally available across instances.
This introduces two admin actions: * Dump to CourseGraph (respect cache), and * Dump to CourseGraph (override cache) which allow admins to select a collection of courses from Django admin and dump them to the Neo4j instance specified by settings.COURSEGRAPH_CONNECTION, with or without respecting the cache (that is: whether the course has already been dumped since its last publishing).
c866e1b to
a34305d
Compare
Update the README of the CMS's CourseGraph support app: * Point to the newly-developed CourseGraph plugin for Tutor, and remove some prose that's now redundant with the Tutor plugin's README. * Add a link to the now-public CourseGraph Queries wiki page. * Capitalize the G in CourseGraph. * Fix a couple misc. formatting things.
|
EdX Release Notice: This PR has been deployed to the staging environment in preparation for a release to production. |
|
EdX Release Notice: This PR may have caused e2e tests to fail on Stage. If you're a member of the edX org, please visit #e2e-troubleshooting on Slack to help diagnose the cause of these failures. Otherwise, it is the reviewer's responsibility. E2E tests have failed. https://gocd.tools.edx.org/go/tab/pipeline/history/deploy_to_stage |
|
EdX Release Notice: This PR has been deployed to the production environment. |
|
Hey @doctoryes , did you hit any trouble running a CourseGraph refresh? |
|
@kdmccormick I'm unable to trigger a refresh myself due to Jenkins permissions. I considered writing an SRE ticket - but I'm on-call next week when the next run will occur. So my current plan is to fix forward any problems at that point. One question: I assume that an app-permissions PR will need to give the |
|
@doctoryes Sounds good! And yup, that's correct. |
* build: update pylint-checks ci workflow * fix: fix quality failures with new pylint version * chore: remove pylint constraint * chore: Updating Python Requirements (#30196) Co-authored-by: edX requirements bot <49161187+edx-requirements-bot@users.noreply.github.com>
Description
In order to learn Tutor, I began working on a CourseGraph plugin for Tutor. The work-in-progress plugin is here. In doing so, a couple CourseGraph pitfalls became apparent:
./manage.py cms dump_to_neo4j) was very clunky: it had several arguments that needed to be specified every time the command was called (instead of falling back to reasonable defaults). Furthermore, it did not provide documentation for those arguments to command-line users; one would have to go diving in the code to find details on each argument.This PR fixes all three of those issues across six commits, which I recommend reviewing individually. Each commit includes more technical details in its body.
This PR depends on openedx-unsupported/devstack#899.
Supporting information
New
dump_to_neo4jcommand-line helptext:The new admin interface, showing a CourseOverviews-backed table, from which sets of courses can be dumped to CourseGraph:

Messaging when successful:

Messaging when there's nothing to do:

Messaging when there was an error (eg, bad Neo4j connection parameters):

Testing instructions
Here are some test suggestions, although I encourage you to mess around with the new features.
Setup
Check out this PR's edx-platform branch.
Check out this devstack branch: openedx-unsupported/devstack#899
From devstack, run
make dev.provision.coursegraph.Visit http://localhost:7474, and log into Neo4j with username
neo4jand passwordedx.In Studio, find a course you can make some arbitrary edits to.
Test management command
./manage.py cms dump_to_neo4j.MATCH (b) RETURN count(b)../manage.py cms dump_to_neo4jagain.Test admin action
make dev.down.coursegraph.make dev.up.coursegraph.Test auto push
cms/envs/private.py, creating it if it doesn't exist.COURSEGRAPH_DUMP_COURSE_ON_PUBLISH = True. Reboot CMS:make studio-restart-devserver.Other information
Blocker: devstack PR
This small devstack PR needs to be reviewed & merged first: openedx-unsupported/devstack#899
edX operational simplifications
This change would allow 2U-OCM/edX to make some operational simplifications if they wanted to (although their current setup should keep working fine as-is):
COURSEGRAPH_CONNECTIONin prod-edx-edxapp-cms, the new admin interface could be used to backfill courses into CourseGraph whenever necessary.COURSEGRAPH_DUMP_ON_COURSE_PUBLISHin prod-edx-edxapp-cms, courses would be automatically & continuously pushed to CourseGraph upon publish.Together, these changes would allow edX to decommission their CourseGraph Jenkins jobs.
The README
The CourseGraph readme will still be accurate after this change, but it'll be missing the new capabilities that this PR adds. I plan on overhauling it all in one go once the CourseGraph Tutor plugin is ready to go.