Skip to content

Several enhancements and refactors to CourseGraph#29156

Merged
doctoryes merged 7 commits intomasterfrom
kdmccormick/coursegraph-push-on-update
Mar 29, 2022
Merged

Several enhancements and refactors to CourseGraph#29156
doctoryes merged 7 commits intomasterfrom
kdmccormick/coursegraph-push-on-update

Conversation

@kdmccormick
Copy link
Member

@kdmccormick kdmccormick commented Oct 28, 2021

Description

In order to learn Tutor, I began working on a CourseGraph plugin for Tutor. The work-in-progress plugin is here. In doing so, a couple CourseGraph pitfalls became apparent:

  • The management command for refreshing CourseGraph from CMS (./manage.py cms dump_to_neo4j) was very clunky: it had several arguments that needed to be specified every time the command was called (instead of falling back to reasonable defaults). Furthermore, it did not provide documentation for those arguments to command-line users; one would have to go diving in the code to find details on each argument.
  • The management command was advertised as available from both LMS and CMS context, but only worked reliably across environments in a CMS context.
  • Needing to run a management command to push courses to CourseGraph at all seemed like an anti-pattern. It would be much easier for operators if CourseGraph could be refreshed from Django admin, or better yet, completely automatically whenever a course is published.

This PR fixes all three of those issues across six commits, which I recommend reviewing individually. Each commit includes more technical details in its body.

This PR depends on openedx-unsupported/devstack#899.

Supporting information

New dump_to_neo4j command-line helptext:

app@2a67edc990cc:~/edx-platform$ ./manage.py cms dump_to_neo4j --help

(.... CMS startup noise ...)

usage: manage.py dump_to_neo4j [-h] [--host HOST] [--port PORT] [--secure] [--user USER] [--password PASSWORD] [--courses [KEY [KEY ...]]] [--skip [KEY [KEY ...]]] [--override] [--version] [-v {0,1,2,3}]
                               [--settings SETTINGS] [--pythonpath PYTHONPATH] [--traceback] [--no-color] [--force-color] [--skip-checks]

Dump recently-published course(s) over to a CourseGraph (Neo4j) instance.

optional arguments:
  -h, --help            show this help message and exit
  --host HOST           the hostname of the Neo4j server
  --port PORT           the port on the Neo4j server that accepts Bolt requests
  --secure              connect to server over Bolt/TLS instead of plain unencrypted Bolt
  --user USER           the username of the Neo4j user
  --password PASSWORD   the password of the Neo4j user
  --courses [KEY [KEY ...]]
                        keys of courses to serialize; if omitted all courses in system are serialized
  --skip [KEY [KEY ...]]
                        keys of courses to NOT to serialize
  --override            dump all courses regardless of when they were last published
  --version             show program's version number and exit
  -v {0,1,2,3}, --verbosity {0,1,2,3}
                        Verbosity level; 0=minimal output, 1=normal output, 2=verbose output, 3=very verbose output
  --settings SETTINGS   The Python path to a settings module, e.g. "myproject.settings.main". If this isn't provided, the DJANGO_SETTINGS_MODULE environment variable will be used.
  --pythonpath PYTHONPATH
                        A directory to add to the Python path, e.g. "/home/djangoprojects/myproject".
  --traceback           Raise on CommandError exceptions
  --no-color            Don't colorize the command output.
  --force-color         Force colorization of the command output.
  --skip-checks         Skip system checks.
app@2a67edc990cc:~/edx-platform$ 

The new admin interface, showing a CourseOverviews-backed table, from which sets of courses can be dumped to CourseGraph:
image

Messaging when successful:
image

Messaging when there's nothing to do:
image

Messaging when there was an error (eg, bad Neo4j connection parameters):
image

Testing instructions

Here are some test suggestions, although I encourage you to mess around with the new features.

Setup

Check out this PR's edx-platform branch.
Check out this devstack branch: openedx-unsupported/devstack#899

From devstack, run make dev.provision.coursegraph.
Visit http://localhost:7474, and log into Neo4j with username neo4j and password edx.

In Studio, find a course you can make some arbitrary edits to.

Test management command

  • Run ./manage.py cms dump_to_neo4j.
  • From Neo4j, run a query. For example, this one just counts all blocks in all courses: MATCH (b) RETURN count(b).
  • Add a block to your course and publish.
  • Run your query again. The change shouldn't be manifested in CourseGraph yet.
  • Run ./manage.py cms dump_to_neo4j again.
  • Run your query again. The change should be manifested in CourseGraph.

Test admin action

  • Go to Studio admin and find "Course graph course dumps".
  • Select your course and dump it (respecting cache). You should be notified that your course was skipped.
  • Add a block to your course and publish.
  • Dump your course again. You should be notified that a dump was enqueued for your course.
  • Dump your course again (overriding cache). You should be notified that a dump was enqueued for your course.
  • Bring down coursegraph: make dev.down.coursegraph.
  • Dump your course again. You should be notified that a connection error occurred.
  • Bring coursegraph back up: make dev.up.coursegraph.

Test auto push

  • Edit the file cms/envs/private.py, creating it if it doesn't exist.
  • Add COURSEGRAPH_DUMP_COURSE_ON_PUBLISH = True. Reboot CMS: make studio-restart-devserver.
  • Add a block to your course and publish.
  • Run your CourseGraph query again. The change should be manifested in CourseGraph without having to use the management command or the admin interface!

Other information

Blocker: devstack PR

This small devstack PR needs to be reviewed & merged first: openedx-unsupported/devstack#899

edX operational simplifications

This change would allow 2U-OCM/edX to make some operational simplifications if they wanted to (although their current setup should keep working fine as-is):

  • By configuring COURSEGRAPH_CONNECTION in prod-edx-edxapp-cms, the new admin interface could be used to backfill courses into CourseGraph whenever necessary.
  • By further enabling COURSEGRAPH_DUMP_ON_COURSE_PUBLISH in prod-edx-edxapp-cms, courses would be automatically & continuously pushed to CourseGraph upon publish.

Together, these changes would allow edX to decommission their CourseGraph Jenkins jobs.

The README

The CourseGraph readme will still be accurate after this change, but it'll be missing the new capabilities that this PR adds. I plan on overhauling it all in one go once the CourseGraph Tutor plugin is ready to go.

@kdmccormick kdmccormick force-pushed the kdmccormick/coursegraph-push-on-update branch from e278844 to 2cd60ba Compare October 28, 2021 17:20
@edx-status-bot
Copy link

Your PR has finished running tests. The following contexts failed:

  • jenkins/django-3.2/quality
  • jenkins/python
  • jenkins/django-3.2/python

@kdmccormick kdmccormick force-pushed the kdmccormick/coursegraph-push-on-update branch 3 times, most recently from 62e4680 to ae6aeaf Compare January 20, 2022 23:47
@kdmccormick kdmccormick changed the title feat: dump to coursegraph on publish or via django admin CourseGraph improvements: move to CMS, streamline CLI, push-on-publish Jan 21, 2022
@kdmccormick kdmccormick force-pushed the kdmccormick/coursegraph-push-on-update branch 4 times, most recently from 9812f58 to 8b1a03a Compare January 21, 2022 18:46
@kdmccormick kdmccormick changed the title CourseGraph improvements: move to CMS, streamline CLI, push-on-publish CourseGraph: move to CMS, streamline CLI, push-on-publish Jan 21, 2022
@kdmccormick kdmccormick changed the title CourseGraph: move to CMS, streamline CLI, push-on-publish CourseGraph: move to CMS, doc, streamline CLI, push-on-publish Jan 21, 2022
@kdmccormick kdmccormick force-pushed the kdmccormick/coursegraph-push-on-update branch from 8b1a03a to 021d93d Compare January 21, 2022 19:29
@kdmccormick kdmccormick changed the title CourseGraph: move to CMS, doc, streamline CLI, push-on-publish CourseGraph: move to CMS, doc, streamline CLI, & push-on-publish Jan 21, 2022
@kdmccormick kdmccormick force-pushed the kdmccormick/coursegraph-push-on-update branch from 021d93d to 7cafb15 Compare January 24, 2022 17:11
@kdmccormick kdmccormick force-pushed the kdmccormick/coursegraph-push-on-update branch from 7cafb15 to 1bb074b Compare January 24, 2022 22:53
@kdmccormick kdmccormick self-assigned this Jan 27, 2022
@kdmccormick kdmccormick force-pushed the kdmccormick/coursegraph-push-on-update branch 2 times, most recently from 1a1eda4 to 15b3c47 Compare February 1, 2022 14:42
@kdmccormick kdmccormick force-pushed the kdmccormick/coursegraph-push-on-update branch 3 times, most recently from 435d7a1 to 6fd3467 Compare February 9, 2022 01:19
@kdmccormick kdmccormick changed the title CourseGraph: move to CMS, doc, streamline CLI, & push-on-publish Several enhancements and refactors to CourseGraph Feb 9, 2022
Comment on lines 16 to 17
from cms.djangoapps.content.block_structure.signals import update_block_structure_on_course_publish
from openedx.core.djangoapps.content.block_structure.signals import update_block_structure_on_course_publish
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: Fixes a mistake from the previous commit.

Comment on lines +241 to +243
self.setup_mock_graph(
mock_matcher_class, mock_graph_class
)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: This enhances a test introduced in previous commit. With the changes to get_course_last_published in this commit, more real code paths are hit, so we need to set up a mock graph.

@kdmccormick kdmccormick marked this pull request as ready for review February 9, 2022 02:15
@kdmccormick kdmccormick force-pushed the kdmccormick/coursegraph-push-on-update branch from 6fd3467 to 59991b4 Compare February 9, 2022 14:00
@doctoryes
Copy link
Contributor

@kdmccormick Change of plans: I'm going to merge this PR tomorrow (Tue 3/28). I just made a logging update to coursegraph and I want to see the results of that change in tonight's run before merging your PR. I will update you as I proceed.

@kdmccormick
Copy link
Member Author

I just added a single additional commit, approved separately in this PR, that updates some CourseGraph documentation (no code changes).

This code was originally located at:
  ./openedx/core/djangoapps/coursegraph

However, code makes more sense within the ./cms tree, because:
* it is responsible for publishing course content to an
  external system, with is within the responsibilities of CMS, and
* is uses modulestore, which is discouraged for use in LMS
  (see 0011-limit-modulestore-use-in-lms.rst).

So, we move the code to:
  ./cms/djangoapps/coursegraph
and uninstall coursegraph from LMS.

We do not expect this refactor to have any breaking downstream effects.
Move most docs out of docstring and into programatically-
displayable argument help text.

Also, the 'Example Usage' was out of date. This commit updates it to:
 * use `./manage.py cms ...' instead of `./manage.py lms ...', and
 * use `--port` instead of `--https_port`.
Introduce a new CMS settings COURSEGRAPH_CONNECTION,
which allows operators to specify default connection paramters
for a Neo4j instance.

This has three purposes:
* The `./manage.py cms dump_to_neo4j` management command will be
  much easier for developers and operators to type out because connection
  arguments can now be omitted. Note that connection arguments, if
  supplied, will override the arguments specified in CMS settings.
* The automatic push-to-coursegraph-on-publish-signal introduced in
  subsequent commits can use these connection settings.
* The CourseGraph Django admin actions introduced in subsequent
  commits can use these connection settings.
Previously, CourseGraph needed to be kept up-to-date by
running `./manage.py dump_to_neo4j ...` manually or on a cron timer.

This introduces a CMS new setting: COURSEGRAPH_DUMP_COURSE_ON_PUBLISH.
When enabled, the CMS course_published signal handler will
asynchronously dump each individual course to CourseGraph when it
is published.

This follows a pattern established by other subsystems like
learning_sequences and special exam registration, both of which
fire off asynchronous post-processing tasks from the course-
publish handler.
The `get_course_last_published` function is used by CourseGraph to
determine whether or not a course should be dumped to Neo4j.
If the course hasn't been published since it was last dumped to
Neo4j, then it can be skipped (unless the override_cache option
is enabled).

The function was previously built using the BlockStructure
data model. While this worked fine in Production instances that
enable `block_structure.storage_backing_for_cache`, this
implementation did NOT work in development environments,
which do not use the BlockStrcture model.

Instead, we switch to using CourseOverview.modified to
approximate when a course was last published. This is method
has fewer moving parts and is universally available across
instances.
This introduces two admin actions:
* Dump to CourseGraph (respect cache), and
* Dump to CourseGraph (override cache)

which allow admins to select a collection of courses from Django
admin and dump them to the Neo4j instance specified by
settings.COURSEGRAPH_CONNECTION, with or without respecting
the cache (that is: whether the course has already been dumped
since its last publishing).
@kdmccormick kdmccormick force-pushed the kdmccormick/coursegraph-push-on-update branch from c866e1b to a34305d Compare March 28, 2022 22:49
Update the README of the CMS's CourseGraph support app:
* Point to the newly-developed CourseGraph plugin for Tutor,
  and remove some prose that's now redundant with the Tutor
  plugin's README.
* Add a link to the now-public CourseGraph Queries wiki page.
* Capitalize the G in CourseGraph.
* Fix a couple misc. formatting things.
@doctoryes doctoryes merged commit 42fcfc8 into master Mar 29, 2022
@doctoryes doctoryes deleted the kdmccormick/coursegraph-push-on-update branch March 29, 2022 15:21
@edx-pipeline-bot
Copy link
Contributor

EdX Release Notice: This PR has been deployed to the staging environment in preparation for a release to production.

@edx-pipeline-bot
Copy link
Contributor

EdX Release Notice: This PR may have caused e2e tests to fail on Stage. If you're a member of the edX org, please visit #e2e-troubleshooting on Slack to help diagnose the cause of these failures. Otherwise, it is the reviewer's responsibility. E2E tests have failed. https://gocd.tools.edx.org/go/tab/pipeline/history/deploy_to_stage

@edx-pipeline-bot
Copy link
Contributor

EdX Release Notice: This PR has been deployed to the production environment.

@kdmccormick
Copy link
Member Author

Hey @doctoryes , did you hit any trouble running a CourseGraph refresh?

@doctoryes
Copy link
Contributor

@kdmccormick I'm unable to trigger a refresh myself due to Jenkins permissions. I considered writing an SRE ticket - but I'm on-call next week when the next run will occur. So my current plan is to fix forward any problems at that point.

One question: I assume that an app-permissions PR will need to give the coursegraph:coursegraphcoursedump:change_coursegraphcoursedump permission to folks who want to access the admin interface?

@kdmccormick
Copy link
Member Author

@doctoryes Sounds good! And yup, that's correct.

timmc-edx referenced this pull request Apr 18, 2022
* build: update pylint-checks ci workflow
* fix: fix quality failures with new pylint version
* chore: remove pylint constraint
* chore: Updating Python Requirements (#30196)
Co-authored-by: edX requirements bot <49161187+edx-requirements-bot@users.noreply.github.com>
timmc-edx added a commit that referenced this pull request Apr 18, 2022
Coursegraph was moved from openedx to cms in commit 92552e5/PR #29156;
module init file was reintroduced in commit 80f9f1d/PR #30197, I think
by accident.
timmc-edx added a commit that referenced this pull request Apr 21, 2022
…t) (#30273)

Coursegraph was moved from openedx to cms in commit 92552e5/PR #29156;
module init file was reintroduced in commit 80f9f1d/PR #30197, I think
by accident.
timmc-edx added a commit that referenced this pull request Apr 21, 2022
…t) (#30273)

Coursegraph was moved from openedx to cms in commit 92552e5/PR #29156;
module init file was reintroduced in commit 80f9f1d/PR #30197, I think
by accident.

Cherry-picked from 8bcec1a
timmc-edx added a commit that referenced this pull request Apr 21, 2022
…t) (#30273) (#30296)

Coursegraph was moved from openedx to cms in commit 92552e5/PR #29156;
module init file was reintroduced in commit 80f9f1d/PR #30197, I think
by accident.

Cherry-picked from 8bcec1a
jawad-khan pushed a commit that referenced this pull request Jun 14, 2022
…t) (#30273)

Coursegraph was moved from openedx to cms in commit 92552e5/PR #29156;
module init file was reintroduced in commit 80f9f1d/PR #30197, I think
by accident.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

5 participants