Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 1 addition & 2 deletions .github/workflows/unit-test-shards.json
Original file line number Diff line number Diff line change
Expand Up @@ -99,7 +99,6 @@
"openedx/core/djangoapps/course_apps/",
"openedx/core/djangoapps/course_date_signals/",
"openedx/core/djangoapps/course_groups/",
"openedx/core/djangoapps/coursegraph/",
"openedx/core/djangoapps/courseware_api/",
"openedx/core/djangoapps/crawlers/",
"openedx/core/djangoapps/credentials/",
Expand Down Expand Up @@ -181,7 +180,6 @@
"openedx/core/djangoapps/course_apps/",
"openedx/core/djangoapps/course_date_signals/",
"openedx/core/djangoapps/course_groups/",
"openedx/core/djangoapps/coursegraph/",
"openedx/core/djangoapps/courseware_api/",
"openedx/core/djangoapps/crawlers/",
"openedx/core/djangoapps/credentials/",
Expand Down Expand Up @@ -240,6 +238,7 @@
"paths": [
"cms/djangoapps/api/",
"cms/djangoapps/cms_user_tasks/",
"cms/djangoapps/coursegraph/",
"cms/djangoapps/course_creators/",
"cms/djangoapps/export_course_metadata/",
"cms/djangoapps/maintenance/",
Expand Down
8 changes: 8 additions & 0 deletions cms/djangoapps/contentstore/signals/handlers.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
from datetime import datetime
from functools import wraps

from django.conf import settings
from django.core.cache import cache
from django.dispatch import receiver
from pytz import UTC
Expand Down Expand Up @@ -55,6 +56,9 @@ def listen_for_course_publish(sender, course_key, **kwargs): # pylint: disable=
update_search_index,
update_special_exams_and_publish
)
from cms.djangoapps.coursegraph.tasks import (
dump_course_to_neo4j
)

# register special exams asynchronously
course_key_str = str(course_key)
Expand All @@ -64,6 +68,10 @@ def listen_for_course_publish(sender, course_key, **kwargs): # pylint: disable=
# Push the course outline to learning_sequences asynchronously.
update_outline_from_modulestore_task.delay(course_key_str)

if settings.COURSEGRAPH_DUMP_COURSE_ON_PUBLISH:
# Push the course out to CourseGraph asynchronously.
dump_course_to_neo4j.delay(course_key_str)

# Finally, call into the course search subsystem
# to kick off an indexing action
if CoursewareSearchIndexer.indexing_is_enabled() and CourseAboutSearchIndexer.indexing_is_enabled():
Expand Down
120 changes: 120 additions & 0 deletions cms/djangoapps/coursegraph/README.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,120 @@

CourseGraph Support
-------------------

This app exists to write data to "CourseGraph", a tool enabling Open edX developers and support specialists to inspect their platform instance's learning content. CourseGraph itself is simply an instance of `Neo4j`_, which is an open-source graph database with a Web interface.

.. _Neo4j: https://neo4j.com

Deploying Coursegraph
=====================

There are two ways to deploy CourseGraph:

* For operators using Tutor, there is a `CourseGraph plugin for Tutor`_ that is currently released as "Beta". Nutmeg is the earliest Open edX release that the plugin will work alongside.

* For operators still using the old Ansible installation pathway, there exists a `neo4j Ansible playbook`_. Be warned that this method is not well-documented nor officially supported.

In order for CourseGraph to have queryable, up-to-date data, learning content from CMS must be written to CourseGraph regularly. That is where this Django app comes into play. For details on the various ways to write CMS data to CourseGraph, visit the `operations section of the CourseGraph Tutor plugin docs`_.

**Please note**: Access to a populated CourseGraph instance confers access to all the learning content in the associated Open edX CMS (Studio). The basic authentication provided by Neo4j may or may not be sufficient for your security needs. Consider taking additional security measures, such as restricting CourseGraph access to only users on a private VPN.

.. _neo4j Ansible playbook: https://github.com/edx/configuration/blob/master/playbooks/neo4j.yml

.. _CourseGraph plugin for Tutor: https://github.com/openedx/tutor-contrib-coursegraph/

.. _operations section of the CourseGraph Tutor plugin docs: https://github.com/openedx/tutor-contrib-coursegraph/#managing-data

Running CourseGraph locally
===========================

In some circumstances, you may want to run CourseGraph locally, connected to a development-mode Open edX instance. You can do this in both Tutor and Devstack.

Tutor
*****

The `CourseGraph plugin for Tutor`_ makes it easy to install, configure, and run CourseGraph for local development.

Devstack
********

CourseGraph is included as an "extra" component in the `Open edX Devstack`_. That is, it is not run or provisioned by default, but can be enabled on-demand.

To provision Devstack CourseGraph with data from Devstack LMS, run::

make dev.provision.coursegraph

CourseGraph should now be accessible at http://localhost:7474 with the username ``neo4j`` and the password ``edx``.

Under the hood, the provisioning command just invokes ``dump_to_neo4j`` on your LMS, pointed at your CourseGraph. The provisioning command can be run again at any point in the future to refresh CourseGraph with new LMS data. The data in CourseGraph will persist unless you explicitly destroy it (as noted below).

Other Devstack CourseGraph commands include::

make dev.up.coursegraph # Bring up the container (without re-provisioning).
make dev.down.coursegraph # Stop and remove the container.
make dev.shell.coursegraph # Start a shell session in the container.
make dev.attach.coursegraph # Attach to the container.
make dev.destroy.coursegraph # Stop the container and destroy its database.

The above commands should be run in your ``devstack`` folder, and they assume that LMS is already properly provisioned. See the `Devstack interface`_ for more details.

.. _Open edX Devstack: https://github.com/edx/devstack/
.. _Devstack interface: https://edx.readthedocs.io/projects/open-edx-devstack/en/latest/devstack_interface.html


Querying Coursegraph
====================

CourseGraph is queryable using the `Cypher`_ query language. Open edX learning content is represented in Neo4j using a straightforward scheme:

* A node is an XBlock usage.

* Nodes are tagged with their ``block_type``, such as:

* ``course``
* ``chapter``
* ``sequential``
* ``vertical``
* ``problem``
* ``html``
* etc.

* Every node is also tagged with ``item``.

* Parent-child relationships in the course hierarchy are reflected in the ``PARENT_OF`` relationship.

* Ordered sibling relationships in the course hierarchy are reflected in the ``PRECEDES`` relationship.

* Fields on each XBlock usage (``.display_name``, ``.data``, etc) are available on the corresponding node.

.. _Cypher: https://neo4j.com/developer/cypher/


Example Queries
***************

How many XBlocks exist in the LMS, by type? ::

MATCH
(c:course) -[:PARENT_OF*]-> (n:item)
RETURN
distinct(n.block_type) as block_type,
count(n) as number
order by
number DESC


In a given course, which units contain problems with custom Python grading code? ::

MATCH
(c:course) -[:PARENT_OF*]-> (u:vertical) -[:PARENT_OF*]-> (p:problem)
WHERE
p.data CONTAINS 'loncapa/python'
AND
c.course_key = '<course_key>'
RETURN
u.location

You can see many more examples of useful CourseGraph queries on the `query archive wiki page`_.

.. _query archive wiki page: https://openedx.atlassian.net/wiki/spaces/COMM/pages/3273228388/Useful+CourseGraph+Queries
123 changes: 123 additions & 0 deletions cms/djangoapps/coursegraph/admin.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,123 @@
"""
Admin site bindings for coursegraph
"""
import logging

from django.contrib import admin, messages
from django.utils.translation import gettext as _
from edx_django_utils.admin.mixins import ReadOnlyAdminMixin

from .models import CourseGraphCourseDump
from .tasks import ModuleStoreSerializer

log = logging.getLogger(__name__)


@admin.action(
permissions=['change'],
description=_("Dump courses to CourseGraph (respect cache)"),
)
def dump_courses(modeladmin, request, queryset):
"""
Admin action to enqueue Dump-to-CourseGraph tasks for a set of courses,
excluding courses that haven't been published since they were last dumped.

queryset is a QuerySet of CourseGraphCourseDump objects, which are just
CourseOverview objects under the hood.
"""
all_course_keys = queryset.values_list('id', flat=True)
serializer = ModuleStoreSerializer(all_course_keys)
try:
submitted, skipped = serializer.dump_courses_to_neo4j()
# Unfortunately there is no unified base class for the reasonable
# exceptions we could expect from py2neo (connection unavailable, bolt protocol
# error, and so on), so we just catch broadly, show a generic error banner,
# and then log the exception for site operators to look at.
except Exception as err: # pylint: disable=broad-except
log.exception(
"Failed to enqueue CourseGraph dumps to Neo4j (respecting cache): %s",
", ".join(str(course_key) for course_key in all_course_keys),
)
modeladmin.message_user(
request,
_("Error enqueueing dumps for {} course(s): {}").format(
len(all_course_keys), str(err)
),
level=messages.ERROR,
)
return
if submitted:
modeladmin.message_user(
request,
_(
"Enqueued dumps for {} course(s). Skipped {} unchanged course(s)."
).format(len(submitted), len(skipped)),
level=messages.SUCCESS,
)
else:
modeladmin.message_user(
request,
_(
"Skipped all {} course(s), as they were unchanged.",
).format(len(skipped)),
level=messages.WARNING,
)


@admin.action(
permissions=['change'],
description=_("Dump courses to CourseGraph (override cache)")
)
def dump_courses_overriding_cache(modeladmin, request, queryset):
"""
Admin action to enqueue Dump-to-CourseGraph tasks for a set of courses
(whether or not they have been published recently).

queryset is a QuerySet of CourseGraphCourseDump objects, which are just
CourseOverview objects under the hood.
"""
all_course_keys = queryset.values_list('id', flat=True)
serializer = ModuleStoreSerializer(all_course_keys)
try:
submitted, _skipped = serializer.dump_courses_to_neo4j(override_cache=True)
# Unfortunately there is no unified base class for the reasonable
# exceptions we could expect from py2neo (connection unavailable, bolt protocol
# error, and so on), so we just catch broadly, show a generic error banner,
# and then log the exception for site operators to look at.
except Exception as err: # pylint: disable=broad-except
log.exception(
"Failed to enqueue CourseGraph Neo4j course dumps (overriding cache): %s",
", ".join(str(course_key) for course_key in all_course_keys),
)
modeladmin.message_user(
request,
_("Error enqueueing dumps for {} course(s): {}").format(
len(all_course_keys), str(err)
),
level=messages.ERROR,
)
return
modeladmin.message_user(
request,
_("Enqueued dumps for {} course(s).").format(len(submitted)),
level=messages.SUCCESS,
)


@admin.register(CourseGraphCourseDump)
class CourseGraphCourseDumpAdmin(ReadOnlyAdminMixin, admin.ModelAdmin):
"""
Model admin for "Course graph course dumps".

Just a read-only table with some useful metadata, allowing admin users to
select courses to be dumped to CourseGraph.
"""
list_display = [
'id',
'display_name',
'modified',
'enrollment_start',
'enrollment_end',
]
search_fields = ['id', 'display_name']
actions = [dump_courses, dump_courses_overriding_cache]
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,6 @@ class CoursegraphConfig(AppConfig):
"""
AppConfig for courseware app
"""
name = 'openedx.core.djangoapps.coursegraph'
name = 'cms.djangoapps.coursegraph'

from openedx.core.djangoapps.coursegraph import tasks
from cms.djangoapps.coursegraph import tasks
Loading