Skip to content

Conversation

@pomegranited
Copy link
Contributor

@pomegranited pomegranited commented Jun 20, 2023

Description

Adds the models, APIs, and permissions/rules for openedx.features.content_tagging, which builds on the oel_tagging app added by openedx/openedx-learning#57.

These changes are the first step towards allow Course Authors to tag content objects (courses, units, libraries) in edx-platform, but none of these changes are user-facing yet.

Supporting information

Closes openedx/modular-learning#63

Testing instructions

This change should be entirely covered by unit tests (still have a few lines left to cover).

But since there's no public-facing APIs yet, if you want to do any further verification, you can test the functionality from the devstack and shell.

  1. Get the master openedx/devstack up and running.
  2. Pull this branch into edx-platform
  3. Install dependencies and update migrations by running paver install_prereqs in the LMS shell (from devstack dir, run make lms-shell) and CMS shell (make studio-shell).
  4. Run the migrations added here by running ./manage.py lms migrate in the LMS shell.
  5. Create file cms/envs/private.py which contains the following. You may need to restart Studio for these changes to take effect: make dev.restart-devserver.studio
    from .devstack import FEATURES
    # Org-based course authoring restrictions
    FEATURES['ENABLE_CREATOR_GROUP'] = True
    FEATURES['STUDIO_REQUEST_EMAIL'] = 'your@email.com'
    
  6. Go to the Studio home page and sign up a new user: http://localhost:18010/
  7. Expand the "Becoming a Course Creator in Studio" box to request course creation access (this is optional -- an "unrequested" entry is created even if you don't request one, and you can approve that below.)
  8. Login to Studio Django Admin in another window as a superuser (http://localhost:18010/admin, edx@example.com / edx)
  9. Create a new Organization, e.g. with Name: OpenCraft, Short Name: OC
  10. Navigate to [Course creators|http://localhost:18010/admin/course_creators/coursecreator/] and mark the request granted, and select the OpenCraft organization as the only org you can create courses for.

From here, you'll have to play around in the shell, because we only have Django Admin capabilities for ContentTaxonomies right now, and you have to be global staff to access Django Admin, which doesn't let us test out our org restrictions.

For example:

# make studio-shell
root@lms:/edx/app/edxapp/edx-platform# ./manage.py cms shell

from django.contrib.auth import get_user_model
# user you created
org_user = get_user_model().objects.get(username='openedx')
# demo users automatically created
learner = get_user_model().objects.get(username='honor')
staff_user = get_user_model().objects.get(username='staff')

from openedx.features.content_tagging import models, rules, api

from organizations.models import Organization
oc = Organization.objects.get(short_name="OC")
org_taxonomy = api.create_taxonomy(name="OpenCraft taxonomy", org_owners=[oc])
org_taxonomy
# <ContentTaxonomy: ContentTaxonomy object (2)>
assert org_user.has_perm('oel_tagging.add_taxonomy', org_taxonomy)
assert org_user.has_perm('oel_tagging.change_taxonomy', org_taxonomy)
assert not learner.has_perm('oel_tagging.add_taxonomy')
assert not learner.has_perm('oel_tagging.change_taxonomy', org_taxonomy)
assert not learner.has_perm('oel_tagging.add_taxonomy', org_taxonomy)

Deadline

None

Other information

Depends on openedx/openedx-learning#57 and openedx/openedx-learning#60

Data migrations added here can be easily rolled back.

Course creator groups added by #26616.

Author to do before merge:

  1. Update dependency: https://github.com/openedx/edx-platform/pull/32518/files#r1243237190
  2. Finish test coverage: https://github.com/openedx/edx-platform/pull/32518/checks?check_run_id=14577165965

@openedx-webhooks openedx-webhooks added the open-source-contribution PR author is not from Axim or 2U label Jun 20, 2023
@openedx-webhooks
Copy link

openedx-webhooks commented Jun 20, 2023

Thanks for the pull request, @pomegranited! Please note that it may take us up to several weeks or months to complete a review and merge your PR.

Feel free to add as much of the following information to the ticket as you can:

  • supporting documentation
  • Open edX discussion forum threads
  • timeline information ("this must be merged by XX date", and why that is)
  • partner information ("this is a course on edx.org")
  • any other information that can help Product understand the context for the PR

All technical communication about the code itself will be done via the GitHub pull request interface. As a reminder, our process documentation is here.

Please let us know once your PR is ready for our review and all tests are green.

@pomegranited pomegranited force-pushed the jill/add-content-tagging branch 3 times, most recently from cf85dbb to 21ab993 Compare June 26, 2023 22:59
@pomegranited pomegranited force-pushed the jill/add-content-tagging branch from 21ab993 to 456979a Compare June 27, 2023 04:04
@pomegranited pomegranited marked this pull request as ready for review June 27, 2023 07:02
@pomegranited pomegranited requested review from a team and bradenmacdonald June 27, 2023 07:02
Comment on lines 82 to 83
# FIXME JV - move to base.in once pypi package created
git+https://github.com/open-craft/openedx-learning.git@jill/taxonomy-api#egg=openedx-learning==0.1
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To do: deploy a release for openedx-learning to pypi and move this to base.in.

return queryset


class ContentTaxonomy(Taxonomy):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not convinced that it makes sense to make this a subclass of Taxonomy. Because it's not a true subclass, right? When ObjectTag calls self.taxonomy.validate_object_tag(self), there's no virtual dispatch involved. Regardless of whether there's a ContentTaxonomy or not, the self.taxonomy property always returns a Taxonomy instance, and only the base class's validate_object_tag() will be called. You would need to use something like django-polymorphic to get this to work (which is not what I am recommending; it's too much magic and we have enough dependencies already).

Copy link
Contributor Author

@pomegranited pomegranited Jun 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh bummer. You're absolutely right. I had fooled myself into thinking this was ok in my tests, because when I created the invalid tags, their taxonomies are ContentTaxonomy instances. But they're not after they're refetched from the database :(

I guess I have to rethink this architecture then..

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I think we can have the same problem with system-defined taxonomies

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to check if a model is an instance of ContentTaxonomy? (with isistance maybe?)

I remember the validate_object_tag function, it was first thought to be inside the API. We could check and depending of the instance, to call one and other function?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The exact same thing has happened to me before... sorry I didn't spot this sooner.

I didn't suggest any solutions because I didn't want to bias you one way or another, but I think there are still some nice ways to achieve a similar architecture without too many changes. One thing to think about for example would be if there is only one database/django model for Taxonomy but it has a field called type and that type determines the runtime behavior, and loads a different class or plugin to handle the implementation details. Basically, put your subclasses and hierarchy into pure python classes and keep the SQL data model simpler or even the same for all of them. But there are other ways to go too, that's just one idea.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bradenmacdonald @pomegranited

It still uses only one query, but that query does an additional JOIN per registered subclass. So whether or not this is acceptable may depend on how many subclasses there are. If it's 1-2, should be fine. But if there are many, the query performance will be slower, even though it's still technically a single query

Adding more context with this in mind. For system-defined taxonomies, we were thinking of creating a subclass for each taxonomy, since each one had its special way of validating (validate_object_tag). With the current approach, in the short term it would already be slow.

But, I've been thinking that at the system taxonomies level, the type field (Language, Author, etc) could be implemented to determine which validation to run. That way we would only have the SystemDefinedTaxonomy subclass and it would not be necessary to create a subclass for each Taxonomy.

So far we will have two subclasses: ContentTaxonomy and SystemDefinedTaxonomy

Copy link
Contributor

@ormsbee ormsbee Jul 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pomegranited that seems fine with me in principle, as long as the code and APIs are carefully constructed to make sure select_subclasses is used. But this is really more of an @ormsbee question so I'd like to hear his thoughts.

I haven't used it before, but as long as the number of subclasses is relatively small, I'm fine with a few joins.

Adding more context with this in mind. For system-defined taxonomies, we were thinking of creating a subclass for each taxonomy, since each one had its special way of validating (validate_object_tag). With the current approach, in the short term it would already be slow.

What are the other types of validation that have to happen on ObjectTag for other taxonomy types? The reason I ask is that I think it's a little odd for the ObjectTag to look to their parent Taxonomy subclass to validate themselves–basically looking upward into the thing containing them. It also seems like unnecessary coupling, since we might want to tag one than one type of thing with the same Taxonomy. In fact we're effectively already doing that here with CourseKey and UsageKey.

Maybe it's ObjectTag that needs the subclassing? It could be made abstract, and have subclasses like CourseTag, BlockTag, etc? And instead of having a completely generic object_id field, we make the object_id return a different real field value in the various subclasses (sort of like what pk does for Django models)? Then we could have the actual field type there–e.g. a CourseKeyField.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the other types of validation that have to happen on ObjectTag for other taxonomy types?

In the case of system-defined taxonomies, the validations that are added are more on the tag side than on the object.

  • Language: Verify if the language is on Django LANGUAGES
  • Organization: Validate that the organization exists and is active.
  • Author: Validate that the User exists and is active.

Later we can add more taxonomies that have their own validation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ormsbee @bradenmacdonald

It still uses only one query, but that query does an additional JOIN per registered subclass. So whether or not this is acceptable may depend on how many subclasses there are. If it's 1-2, should be fine.

Yeah, as @ChrisChV noted, there's going to be a lot of these.. so using InheritanceManager like I've done in openedx/openedx-learning#60 not a good idea.

The reason I ask is that I think it's a little odd for the ObjectTag to look to their parent Taxonomy subclass to validate themselves–basically looking upward into the thing containing them.

I agree that ObjectTag.is_valid is an awkward-looking property -- I added it as a convenience method, because we're going to need the REST API to return an is_valid flag for object tags, so that the frontend can highlight invalid tags for the content authors to correct.

In the design stage, we pushed the ObjectTag validation into the Taxonomy class so we could avoid having to subclass both Taxonomies and ObjectTags, and link them together somehow (see below for a proposed way to do this). This made Taxonomy.validate_object_tag() strange, but it was intended to keep the class complexity down. But that decision was clearly flawed.

Maybe it's ObjectTag that needs the subclassing? It could be made abstract, and have subclasses like CourseTag, BlockTag, etc?

Ok sure -- there weren't any requirements in the spec that made those particular distinctions necessary, however..

We could subclass ObjectTag to encapsulate the "closed taxonomy tags" vs "free text tags" distinction. To do this, we need a way to connect a Taxonomy with its appropriate ObjectTag subclass. To try this out, I drafted open-craft/openedx-learning#2; see Taxonony.object_tag_classes for an example of how this could work.

There, I also allowed for custom ObjectTag classes stored with a Taxonomy that could be used to validate system tags for the different system taxonomy use cases listed above. We could replace the SystemTaxonomy subclasses and their validation with class methods on ObjectTag subclasses instead. Python is bad at polymorphism across class methods, but with care, we can do it, e.g

  • Taxonomy subclass LanguageTaxonomy.get_tags() was going to return a dynamically-determined list of Tags created from the django settings.LANGUAGES.
    LanguageObjectTag could implement a class method to do the same thing.
  • AuthorTaxonomy has too many available tags to fetch them all in a list, so it would implement autocomplete_tags() instead, and just return the first N dynamically-created Tags with the given prefix.
    AuthorObjectTag could implement a class method to do the same thing.

So, what do you think of something like open-craft/openedx-learning#2 instead?

Also, if we do this, it'd be best to stop returning ObjectTag Django Models from the API altogether, and instead use the models as data to populate proper python classes and subclasses, as @bradenmacdonald originally suggested.

If we found that we still need Taxonomy subclasses, we could add a Taxonomy.taxonomy_class field+property like Taxonony.object_tag_classes, and load these Taxonomy subclasses on the fly in the API using a generator, as shown with get_object_tags().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ormsbee @bradenmacdonald @ChrisChV

Ok, this is looking really good. I'm going to close this PR in favor of #32661, so can we continue this conversation there?

to do the low-effort checks first.
* updates requirement to use WIP branch for openedx-learning
* import get_taxonomy and get_taxonomies from the oel_tagging API
* rename content_tagging.api.get_taxonomies to get_taxonomies_for_org
* adds tests to verify that API returns expected Taxonomy subclasses
* adds tests to verify object tag validation uses the correct Taxonomy subclass
@pomegranited
Copy link
Contributor Author

Closed in favor of #32661

@openedx-webhooks
Copy link

@pomegranited Even though your pull request wasn’t merged, please take a moment to answer a two question survey so we can improve your experience in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

open-source-contribution PR author is not from Axim or 2U

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

[Tagging] Implement Taxonomy Data Architecture models + APIs

5 participants