-
Notifications
You must be signed in to change notification settings - Fork 4.2k
feat: adds content tagging app, models, and API #32518
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thanks for the pull request, @pomegranited! Please note that it may take us up to several weeks or months to complete a review and merge your PR. Feel free to add as much of the following information to the ticket as you can:
All technical communication about the code itself will be done via the GitHub pull request interface. As a reminder, our process documentation is here. Please let us know once your PR is ready for our review and all tests are green. |
cf85dbb to
21ab993
Compare
from cms.djangoapps.contentstore.helpers to common.djangoapps.student.auth
21ab993 to
456979a
Compare
requirements/edx/github.in
Outdated
| # FIXME JV - move to base.in once pypi package created | ||
| git+https://github.com/open-craft/openedx-learning.git@jill/taxonomy-api#egg=openedx-learning==0.1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To do: deploy a release for openedx-learning to pypi and move this to base.in.
| return queryset | ||
|
|
||
|
|
||
| class ContentTaxonomy(Taxonomy): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not convinced that it makes sense to make this a subclass of Taxonomy. Because it's not a true subclass, right? When ObjectTag calls self.taxonomy.validate_object_tag(self), there's no virtual dispatch involved. Regardless of whether there's a ContentTaxonomy or not, the self.taxonomy property always returns a Taxonomy instance, and only the base class's validate_object_tag() will be called. You would need to use something like django-polymorphic to get this to work (which is not what I am recommending; it's too much magic and we have enough dependencies already).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh bummer. You're absolutely right. I had fooled myself into thinking this was ok in my tests, because when I created the invalid tags, their taxonomies are ContentTaxonomy instances. But they're not after they're refetched from the database :(
I guess I have to rethink this architecture then..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I think we can have the same problem with system-defined taxonomies
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way to check if a model is an instance of ContentTaxonomy? (with isistance maybe?)
I remember the validate_object_tag function, it was first thought to be inside the API. We could check and depending of the instance, to call one and other function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The exact same thing has happened to me before... sorry I didn't spot this sooner.
I didn't suggest any solutions because I didn't want to bias you one way or another, but I think there are still some nice ways to achieve a similar architecture without too many changes. One thing to think about for example would be if there is only one database/django model for Taxonomy but it has a field called type and that type determines the runtime behavior, and loads a different class or plugin to handle the implementation details. Basically, put your subclasses and hierarchy into pure python classes and keep the SQL data model simpler or even the same for all of them. But there are other ways to go too, that's just one idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bradenmacdonald @pomegranited
It still uses only one query, but that query does an additional JOIN per registered subclass. So whether or not this is acceptable may depend on how many subclasses there are. If it's 1-2, should be fine. But if there are many, the query performance will be slower, even though it's still technically a single query
Adding more context with this in mind. For system-defined taxonomies, we were thinking of creating a subclass for each taxonomy, since each one had its special way of validating (validate_object_tag). With the current approach, in the short term it would already be slow.
But, I've been thinking that at the system taxonomies level, the type field (Language, Author, etc) could be implemented to determine which validation to run. That way we would only have the SystemDefinedTaxonomy subclass and it would not be necessary to create a subclass for each Taxonomy.
So far we will have two subclasses: ContentTaxonomy and SystemDefinedTaxonomy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pomegranited that seems fine with me in principle, as long as the code and APIs are carefully constructed to make sure
select_subclassesis used. But this is really more of an @ormsbee question so I'd like to hear his thoughts.
I haven't used it before, but as long as the number of subclasses is relatively small, I'm fine with a few joins.
Adding more context with this in mind. For system-defined taxonomies, we were thinking of creating a subclass for each taxonomy, since each one had its special way of validating (
validate_object_tag). With the current approach, in the short term it would already be slow.
What are the other types of validation that have to happen on ObjectTag for other taxonomy types? The reason I ask is that I think it's a little odd for the ObjectTag to look to their parent Taxonomy subclass to validate themselves–basically looking upward into the thing containing them. It also seems like unnecessary coupling, since we might want to tag one than one type of thing with the same Taxonomy. In fact we're effectively already doing that here with CourseKey and UsageKey.
Maybe it's ObjectTag that needs the subclassing? It could be made abstract, and have subclasses like CourseTag, BlockTag, etc? And instead of having a completely generic object_id field, we make the object_id return a different real field value in the various subclasses (sort of like what pk does for Django models)? Then we could have the actual field type there–e.g. a CourseKeyField.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the other types of validation that have to happen on ObjectTag for other taxonomy types?
In the case of system-defined taxonomies, the validations that are added are more on the tag side than on the object.
- Language: Verify if the language is on Django LANGUAGES
- Organization: Validate that the organization exists and is active.
- Author: Validate that the User exists and is active.
Later we can add more taxonomies that have their own validation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It still uses only one query, but that query does an additional JOIN per registered subclass. So whether or not this is acceptable may depend on how many subclasses there are. If it's 1-2, should be fine.
Yeah, as @ChrisChV noted, there's going to be a lot of these.. so using InheritanceManager like I've done in openedx/openedx-learning#60 not a good idea.
The reason I ask is that I think it's a little odd for the ObjectTag to look to their parent Taxonomy subclass to validate themselves–basically looking upward into the thing containing them.
I agree that ObjectTag.is_valid is an awkward-looking property -- I added it as a convenience method, because we're going to need the REST API to return an is_valid flag for object tags, so that the frontend can highlight invalid tags for the content authors to correct.
In the design stage, we pushed the ObjectTag validation into the Taxonomy class so we could avoid having to subclass both Taxonomies and ObjectTags, and link them together somehow (see below for a proposed way to do this). This made Taxonomy.validate_object_tag() strange, but it was intended to keep the class complexity down. But that decision was clearly flawed.
Maybe it's ObjectTag that needs the subclassing? It could be made abstract, and have subclasses like CourseTag, BlockTag, etc?
Ok sure -- there weren't any requirements in the spec that made those particular distinctions necessary, however..
We could subclass ObjectTag to encapsulate the "closed taxonomy tags" vs "free text tags" distinction. To do this, we need a way to connect a Taxonomy with its appropriate ObjectTag subclass. To try this out, I drafted open-craft/openedx-learning#2; see Taxonony.object_tag_classes for an example of how this could work.
There, I also allowed for custom ObjectTag classes stored with a Taxonomy that could be used to validate system tags for the different system taxonomy use cases listed above. We could replace the SystemTaxonomy subclasses and their validation with class methods on ObjectTag subclasses instead. Python is bad at polymorphism across class methods, but with care, we can do it, e.g
- Taxonomy subclass
LanguageTaxonomy.get_tags()was going to return a dynamically-determined list of Tags created from the django settings.LANGUAGES.
LanguageObjectTag could implement a class method to do the same thing. AuthorTaxonomyhas too many available tags to fetch them all in a list, so it would implementautocomplete_tags()instead, and just return the first N dynamically-created Tags with the given prefix.
AuthorObjectTag could implement a class method to do the same thing.
So, what do you think of something like open-craft/openedx-learning#2 instead?
Also, if we do this, it'd be best to stop returning ObjectTag Django Models from the API altogether, and instead use the models as data to populate proper python classes and subclasses, as @bradenmacdonald originally suggested.
If we found that we still need Taxonomy subclasses, we could add a Taxonomy.taxonomy_class field+property like Taxonony.object_tag_classes, and load these Taxonomy subclasses on the fly in the API using a generator, as shown with get_object_tags().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ormsbee @bradenmacdonald @ChrisChV
Ok, this is looking really good. I'm going to close this PR in favor of #32661, so can we continue this conversation there?
to do the low-effort checks first.
* updates requirement to use WIP branch for openedx-learning * import get_taxonomy and get_taxonomies from the oel_tagging API * rename content_tagging.api.get_taxonomies to get_taxonomies_for_org * adds tests to verify that API returns expected Taxonomy subclasses * adds tests to verify object tag validation uses the correct Taxonomy subclass
|
Closed in favor of #32661 |
|
@pomegranited Even though your pull request wasn’t merged, please take a moment to answer a two question survey so we can improve your experience in the future. |
Description
Adds the models, APIs, and permissions/rules for
openedx.features.content_tagging, which builds on theoel_taggingapp added by openedx/openedx-learning#57.These changes are the first step towards allow Course Authors to tag content objects (courses, units, libraries) in edx-platform, but none of these changes are user-facing yet.
Supporting information
Closes openedx/modular-learning#63
Testing instructions
This change should be entirely covered by unit tests (still have a few lines left to cover).
But since there's no public-facing APIs yet, if you want to do any further verification, you can test the functionality from the devstack and shell.
masteropenedx/devstack up and running.paver install_prereqsin the LMS shell (fromdevstackdir, runmake lms-shell) and CMS shell (make studio-shell)../manage.py lms migratein the LMS shell.cms/envs/private.pywhich contains the following. You may need to restart Studio for these changes to take effect:make dev.restart-devserver.studioedx@example.com/edx)granted, and select the OpenCraft organization as the only org you can create courses for.From here, you'll have to play around in the shell, because we only have Django Admin capabilities for ContentTaxonomies right now, and you have to be global staff to access Django Admin, which doesn't let us test out our org restrictions.
For example:
Deadline
None
Other information
Depends on openedx/openedx-learning#57 and openedx/openedx-learning#60
Data migrations added here can be easily rolled back.
Course creator groups added by #26616.
Author to do before merge: