-
Notifications
You must be signed in to change notification settings - Fork 18
Publishing API with centralized publishing models #41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
ormsbee
commented
Apr 22, 2023
- chore: Switch from edx-sphinx-theme to sphinx-book-theme
- docs: ADR for tagging service. (docs: ADR for tagging service. #40)
- feat: add support for static assets
- chore: add DRF, remove codecov, rebuild requirements
- feat: add publishing API
|
Making the barest version of the APIs first (no real error handling). Will flesh out after I have the basic structure of the loading script converted to only make API calls. |
|
@bradenmacdonald, @kdmccormick, @feanil: If you have time, could you please take a look at the The API layer is a mess right now, and probably not worth looking at. I wanted to get it to a point where Thanks folks. |
kdmccormick
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the data model and how you've split it between the three apps looks great. All I have is a question about LearningPackage and some docstring nitpicks.
| component_version=component_version, | ||
| raw_content_id=raw_content_id, | ||
| identifier=identifier, | ||
| learner_downloadable=False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| learner_downloadable=False, | |
| learner_downloadable=learner_downloadable, |
I know you said not to look at api.py; just wanted to point this out so that it doesn't cause you headache down the road.
| If you are referencing this model from within the same process, use a | ||
| foreign key to the id. If you are referencing this PublishedEntity from an | ||
| external system/service, use the UUID. The identifier is the part that is | ||
| most likely to be human-readable, and may be exported/copied, but try not to | ||
| rely on it, since this value may change. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very helpful guidelines 👍🏻
| It would have saved a little space to add this data to the Published model | ||
| (and possibly call the combined model something else). Split Modulestore did | ||
| this with its active_versions table. I keep it separate here to get a better | ||
| separation of lifecycle events: i.e. this table *only* changes when drafts | ||
| are updated, not when publishing happens. The Published model only changes | ||
| when something is published. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this context. I like the separation too.
| class LearningPackage(models.Model): | ||
| """ | ||
| Top level container for a grouping of authored content. | ||
| Each PublishableEntity belongs to exactly one LearningPackage. | ||
| """ | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that entities are all individually publishable, what role does LearningPackage play in the publishing app?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had the same question. I guess it's about namespacing identifiers?
I have the vague impression that the publishing app is not as generic as it could be when it's tied to the concept of LearningPackage. But maybe I'm just not clear on the role of LearningPackage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Providing a namespace for identifiers is a big one. The other thing is that I wanted some sort of wrapping data structure that was roughly equivalent to a course or library, since this is how a lot of content will be organized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍🏻
| The reason this is built directly into the Learning Core data model is | ||
| because we want to be able to easily access and browse this data even if the | ||
| app-extended models get deleted (e.g. if they are deprecated and removed). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was helpful for me 👍🏻
| openedx_learning.core.components | ||
| openedx_learning.core.contents | ||
| openedx_learning.core.publishing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I love this--it conveys so much about the relationship between the apps in one glance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I thought of real critique:
Django has built-in support for multi-table inheritance: https://docs.djangoproject.com/en/3.2/topics/db/models/#multi-table-inheritance
What do you think about using that instead of the the PublishableEntity{Version} / PublishableEntity{Version}Mixin combo? I think the resulting data model would be identical, but it'd be more idiomatic Django.
|
@kdmccormick: That's a good point. I think I was turned off of multi-table inheritance years ago because of how strongly certain books (like Two Scoops) advise against it. My understanding was that it did a lot of surprising left outer joins. But that was years and years ago, I haven't really tested it, and you're right in that it looks like a good fit as long as those joins aren't forced onto the base class (i.e. the |
|
@kdmccormick: Thinking on it a bit more, the reverse relations also become a bit weirder. For instance, in a non-inheritance model, the |
|
@ormsbee If that's the main issue, then we could avoid it by calling the generic reverse relation
That's good to know; I think I've only ever used it once, and it wasn't in a situation where performance was top-of-mind. If you think using Django's inhertance would lead to more surprising queries then I am OK with the hand-rolled version you have now, which at the very least is easy for us to reason about. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense overall, though my vague gut feeling is that the publishing app seems to own too much logic about how things are grouped (LearningPackage) and identified (id, uuid, and identifier are part of PublishableEntity).
For example, does the publishing app even use the UUID field? If not, why isn't it up to each publishable model like Component to decide if it wants a UUID or not? I feel like it would be nice if the Publishable models/mixins could be configured to use any sort of namespacing/identifier scheme that the Publishables themselves want to use. But perhaps that adds too much complexity or abstraction.
Edit: after a night's sleep, I think this does make sense to me. Published things need to have the right identifiers so they can be used throughout the system, and the UUID is serving the role of ISBN in physical book publishing.
| # query by other Component fields within a given LearningPackage, which is | ||
| # going to be a common use case (and we can't make a compound index using | ||
| # columns from different tables). | ||
| learning_package = models.ForeignKey(LearningPackage, on_delete=models.CASCADE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could add a constraint like models.CheckConstraint(check=Q(publishable_entity__learning_package=F("learning_package")), name='learning_package') to ensure this is always the same at the database level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added this, thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, nvm, this breaks:
❯ python manage.py migrate
SystemCheckError: System check identified some issues:
ERRORS:
oel_components.Component: (models.E041) 'constraints' refers to the joined field 'publishable_entity__learning_package'.
| ``uuid`` should be treated as immutable. | ||
| PublishableEntityVersions are created once and never updated. So for | ||
| instance, the ``title`` should never be modified. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if I make 100 small edits (to a draft) without publishing, there will still be 100 PublishableEntityVersion rows created, but just one Draft row which gets updated each time, and no PublishLog[Record]/Published rows. Is that right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup!
| class LearningPackage(models.Model): | ||
| """ | ||
| Top level container for a grouping of authored content. | ||
| Each PublishableEntity belongs to exactly one LearningPackage. | ||
| """ | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had the same question. I guess it's about namespacing identifiers?
I have the vague impression that the publishing app is not as generic as it could be when it's tied to the concept of LearningPackage. But maybe I'm just not clear on the role of LearningPackage.
Right, I realize we can rename things to avoid the collision in this case. I meant the general notion that inheritance will conflate relations between things in the
I feel like the most intuitive thing for developers would be to have something where they can say: component.published_version # This should be a ComponentVersion
component.draft_version # This should be a ComponentVersionSo something where the part of the API that content apps are exposed to runs through publishing's models but spits back their own models, so they're not actually exposed to any of the publishing details. But that means making some little helper method (either inherited or a mixin) either way. Given that, I'm leaning more towards clunky-but-less-coupled (i.e. no inheritance). But I'll experiment with it some. |
|
@ormsbee Thanks for considering it. When it comes to the question:
you've been actually working in this codebase and I haven't, so I think you have a much better intuition for what will feel
I agree, that does look very intuitive. It would be a change to your data model, but I think using generic relations you could achieve something like this? Some pseudo-Django: class VersionRegistry(Model):
"""
Links versionless things to their draft and published versions.
"""
# The pair of concrete types. Component/ComponentVersion, Unit/UnitVersion, etc.
verionless_type = ForeignKey(ContentType)
versioned_type = ForeignKey(ContentType)
# The concrete versionless entity. An instance of a Component, Unit, etc.
versionless_id = PositiveIntegerField()
versionless = GenericForeignKey('versionless_type', 'versionless_id', related_name="versioning")
# The concrete published and draft entity versions. An instance of ComponentVersion, UnitVersion, etc.
published_id = PositiveIntegerField()
published = GenericForeignKey('versioned_type', 'published_id', related_name="versioning")
draft_id = PositiveIntegerField()
draft = GenericForeignKey('versioned_type', 'draft_id', related_name="versioning") For developers using components, it'd look like: component.versioning.published # a ComponentVersion
component.versioning.draft # a ComponentVersion
# or, starting with a draft component:
draft.versioning.published # a ComponentVersion
draft.versioning.versionless # a Component, equivalent to `draft.component`(disclaimer: I have no idea what the queries patterns would look like) (for an Open edX example of this, credentials uses a generic relation in order to tie users to the multiple different types of credentials in one table). |
As an aside, FWIW, this was something I struggled with in the publishing/components split, especially since Component already has its own tuple of I also did want a LearningPackage-wide identifier that was app-chosen and locally unique, so that there was a common way to look things up within a LearningPackage. Honestly, the place where I feel that I'm on the shakiest ground is UUIDs for Versions of things because with the way versions work right now, every version is uniquely identifiable by a combination of PublishableEntity UUID + version_num. I have a possibly irrational dislike of compound-ids, and I wasn't sure about the version_num thing in the beginning, though I'm more committed to it now. I'm thinking of taking out the UUIDs on EntityVersion largely because I'm not sure how we'd handle them properly in import/export. (At least with Entities, we could export a mapping file of UUIDs to identifiers.) Edit: I guess versioning doesn't explicitly make it into the plain OLX import/export either way. Still thinking of killing the UUID on versioned things... |
:-) I somehow missed this part. You said it much more succinctly than I did. I'm going to steal this analogy for the docstring. |
|
I don't know why this only occurs to me now, but a Grading Policy would be a totally reasonable thing to model as a |
|
Which also leads me to weird places. Would we ever have a library of grading policies and taxonomies? 😛 |
| assert component.versioning.draft == component_version | ||
| assert component.versioning.published is None | ||
| publish_all_drafts(self.learning_package.pk, published_at=self.now) | ||
|
|
||
| # Force the re-fetch from the database | ||
| component = get_component_by_pk(component.pk) | ||
| assert component.versioning.published == component_version | ||
|
|
||
| # Grabbing the list of versions for this component | ||
| assert list(component.versioning.versions) == [component_version] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kdmccormick: the versions helper interface that you suggested (though it's implemented by hand through existing model relations instead of via generic foreign keys).
|
@kdmccormick, @bradenmacdonald, @feanil: Made updates to the models. @kdmccormick: I ended up going for the non-inheritance approach, with a helper in the mixin. I tried it with generic relations and it's workable, but I felt like it added fields that weren't really necessary and I worried that it would encourage a pattern where publishing starts to know more about the specific things being published. I'm going to fix up the APIs more, but I think I'd like to just land them in a rough shape. They'll probably need a few followup PRs to refine them, but I want to land the data models in a solid enough shape where we can start putting actual data into them (i.e stop my "blow away the world and remake the migrations" approach), and I think this PR will achieve that. The API modules at this point are mostly just to have a convenient proof-of-concept to dump stuff in and out of those models. |
feanil
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked just at the publishing models and they make sense to me. I had some questions/clarification requests for docstrings but nothing blocking.
| # This will FAIL because it's going to use the relation value | ||
| # cached on component instead of going to the database again. | ||
| # You need to re-fetch the component for this to work. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would you do the re-fetching, by running refresh_from_db or something else? Any chance you can document what the correct thing to do here as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add it, but basically you have to refetch it from the database. I'm hesitant to build anything smart around that because that's more of a Django ORM thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ormsbee Did you consider the collapsing VersioningHelper directly into the PublishableEntityMixin class?
In addition to simplifying the class structure, having component.published instead of component.versioning.published might be more immediately obvious to devs that .published will not automatically be updated without a re-fetch.
If I pretend I don't know this codebase, the .versioning. attribute in the middle kinda hints "magic happens here", even though the it's really just a syntactic helper construct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ormsbee Did you consider the collapsing
VersioningHelperdirectly into thePublishableEntityMixinclass?
I did. In fact, my original comment was proposing that. But I liked your suggestion of putting it behind versioning better, because I expect that a bunch of other stuff will make it into this class over time.
In addition to simplifying the class structure, having
component.publishedinstead ofcomponent.versioning.publishedmight be more immediately obvious to devs that.publishedwill not automatically be updated without a re-fetch.If I pretend I don't know this codebase, the
.versioning.attribute in the middle kinda hints "magic happens here", even though the it's really just a syntactic helper construct.
I mean, it is a little magical, right? At least when compared to the simpler properties that just proxy to fields in the accompanying publisher app models (like uuid). But it's relatively small magic.
Maybe I should just change my implementation to always refetch. That's probably the more intuitive thing anyway, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha. Sorry for the suggestion whiplash. When I suggested the .versioning. interface, I was thinking of it as an alternative to PublishableEntity{Version}Mixin. Since we're sticking with the mixins, I could go either way on .versioning.
I think:
component.publishedimplies no re-fetch.component.versioning.publishedcould go either way.component.get_versioning().publishedwould heavily imply re-fetch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to stick with versioning for now. It is a bit magical, but I think it's more consistent with Django ORM conventions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍🏻
| Note: When we actually implement the ability to change identifiers, we | ||
| should make a history table and a modified attribute on this model. | ||
| Why are Identifiers in this Model? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's nice to have these here but they also feel like ADRs. Maybe it makes sense to extract them eventually?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair point. Making a short one around identifiers now. I'll punt others until later.
| 1. No Draft entry for a PublishableEntity: This means a PublishableEntity | ||
| was created, but no PublishableEntityVersion was ever made for it, so | ||
| there was never a Draft version. | ||
| 2. A Draft entry exists and points to a PublishableEntityVersion: This is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the state when the PublishableEntityVersion is also the latest published one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not necessarily. The Draft entry will always point to the most recently created PublishableEntityVersion. That may or may not be the published version–that depends on the entry in the Published model.
| name = "openedx_learning.core.publishing" | ||
| verbose_name = "Learning Core: Publishing" | ||
| default_auto_field = "django.db.models.BigAutoField" | ||
| label = "oel_publishing" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Flagging this for reviewers: does this sort of prefix label sound okay? I'm a little worried about namespace collision with app names in the LMS and potentially elsewhere (since the Django app namespace is flat), and I thought having a repo-wide prefix would help. It does mean that it's a little less intuitive when you do things like makemigrations, since the app will now be oel_publishing instead of publishing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks reasonable to me.
You could be super explicit and do openedx_learning_publishing. If that feels too verbose, though, then oel_publishing is good.
kdmccormick
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good, just a couple questions.
FYI, I paid closer attention to the models and model_mixins than I did the apis or load_components.
| There may be a more clever way to introspect this information from the model | ||
| metadata, but this is simple and explicit. | ||
| """ | ||
| return PublishableContentModelRegistry.register( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider doing an isinstance check first to make sure that the arguments are sublclasses of PublishableEntityMixin and PublishableEntityVersionMixin, respectively. I feel like the error would be very hard to track down if, say, the arguments were swapped by mistake.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added issubclass checks. Now it'll fail startup if it's improperly configured, with an error like:
raise ImproperlyConfigured(
django.core.exceptions.ImproperlyConfigured: <class 'openedx_learning.core.components.models.ComponentVersion'> must inherit from PublishableEntityMixin
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👌🏻
| # This will FAIL because it's going to use the relation value | ||
| # cached on component instead of going to the database again. | ||
| # You need to re-fetch the component for this to work. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha. Sorry for the suggestion whiplash. When I suggested the .versioning. interface, I was thinking of it as an alternative to PublishableEntity{Version}Mixin. Since we're sticking with the mixins, I could go either way on .versioning.
I think:
component.publishedimplies no re-fetch.component.versioning.publishedcould go either way.component.get_versioning().publishedwould heavily imply re-fetch.
| return package | ||
|
|
||
|
|
||
| def create_publishable_entity(learning_package_id, key, created, created_by): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a general comment: The class hierarchy in openedx-learning is becoming non-trivial. That's not suprising, since we're tackling a complex problem, but having type annotations on all of these API functions could make it much easier to grok. I know you're still iterating on the APIs, so don't consider this blocking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, all api.py functions should have annotations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add those annotations when I add the action that actually checks them in CI. 😞 I need to do some care and feeding for the CI in this next week.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍🏻
| for draft in draft_qset.select_related("entity__published"): | ||
| # There might not be a currently published version for this | ||
| # PublishableEntity yet. | ||
| if hasattr(draft.entity, "published"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ormsbee sorry, I'm confused about this too: how does published get assigned on the PublishableEntity instance? Why is it not always available on the instance? I would have expected this check to look like:
if draft.entity.published:There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Published model a OneToOneField pointing to PublishableEntity, so you can access the relationship in both directions. But when you're coming through the reverse relation (entity.published), there's a chance that there is no corresponding row in Published. When that happens, Django will throw a ObjectDoesNotExist if you try to access it, and they recommend checking with hasattr:
https://docs.djangoproject.com/en/3.2/topics/db/examples/one_to_one/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I thought it would be something like that, but I was expecting the ObjectDoesNotExist try-except. When I saw hasattr I assumed it must be something more complicated.
I would prefer the try-except here; Django's hasattr recommendation here strikes me as very non-explicit and unpythonic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
|
@kdmccormick: I think I've addressed all your comments and suggestions. |
This is a major rethinking of the core data apps: * Parts of the components app were extracted into publishing and a new contents app. * The publishing app models are now entirely responsible for the draft and publish state management. * There is now a startup registration step for content models that want to build off of publishing's PublishableEntity and PublishableEntityVersion (register_content_models). See the ComponentsConfig class for an example of it in use. * Many model and field names were changed to make their intent clearer. (See comments in the models and ADRs in this commit for details.) * Apps now carry an "oel_" prefix in their labels. * An interim API layer was created, though this is still very much a work in progress, and should not be considered stable.
kdmccormick
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚀🚀🚀
Enabling WAL on dev.db will generate extra dev.db-shm and dev.db-wal files that we want git to ignore..
cdd836f to
c4fd319
Compare