-
Notifications
You must be signed in to change notification settings - Fork 18
Support for grouped changesets of draft modifications #290
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| @set_draft_version.register(int) | ||
| def _( | ||
| publishable_entity_id: int, | ||
| publishable_entity_version_pk: int | None, | ||
| /, | ||
| set_at: datetime | None = None, | ||
| set_by: int | None = None, # User.id | ||
| create_transaction: bool = True, | ||
| ) -> None: | ||
| """ | ||
| Alias for set_draft_version taking PublishableEntity.id instead of a Draft. | ||
| """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just wanted to call out that this is the first time I'm introducing this pattern into openedx-learning (or really anywhere in Open edX code that I'm aware of), where we use functools.singledispatch to give multiple versions of the same function and switch based on the first parameter type. Wanted to get people's thoughts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very very cool. I like this a lot more than taking a big Union of types and then having to handle them with a bunch of if statements at the top of the function. It's also nice because it encourages (a) static typing and (b) having one canonical version of function, and then one or more "aliases" which just lightly wrap the canonical verison.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like that it encourages static typing, but it seems to make the code longer and slightly harder to follow compared to just having different names for the different versions of each function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, a major knock against it is that it seems to deeply confuse Pylance, making it so that the auto-complete is useless. The more I read about singledispatch and its handling in various tools, the more it seems to fall in the category of "this is a weird thing that has to be special cased everywhere". I'm leaning towards actually making it "a big Union of types and then having to handle them with a bunch of if statements at the top of the function" at this point because of this. It's "ugly" but it's simple and tooling-friendly.
| if set_at is None: | ||
| set_at = datetime.now(tz=timezone.utc) | ||
|
|
||
| tx_context = atomic() if create_transaction else nullcontext() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New convention: optionally creating a transaction, based on passed in parameters. This is to avoid unnecessarily opening transactions when we're being invoked from things that already open transactions.
| with bulk_draft_changes_for(learning_package.id): | ||
| for section in course: | ||
| update_section_drafts(learning_package_id, section) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another new/different thing I'm doing in this PR: Using a context manager to allow parts of the publishing API to access the active DraftChangeLog.
| @@ -1,95 +0,0 @@ | |||
| """ | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I split the two models in this module into draft_log.py and publish_log.py
| active_change_log = DraftChangeLogContext.get_active_draft_change_log(learning_package_id) | ||
|
|
||
| # If there's an active DraftChangeLog, we're already in a transaction, so | ||
| # there's no need to open a new one. | ||
| if active_change_log: | ||
| tx_context = nullcontext() | ||
| else: | ||
| tx_context = bulk_draft_changes_for( | ||
| learning_package_id, changed_at=reset_at, changed_by=reset_by | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A more extreme version of optional transaction creation. Before this, I was adding a lot of redundant logic in reset_drafts_to_published to avoid the overhead of calling set_draft_version a bunch of times.
| verbose_name_plural = "Publish Log Records" | ||
|
|
||
|
|
||
| class Published(models.Model): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't make any changes to this model beyond moving it.
| from .publishable_entity import PublishableEntity, PublishableEntityVersion | ||
|
|
||
|
|
||
| class Draft(models.Model): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't make any changes to this model beyond moving it.
bca7747 to
0ac4531
Compare
|
I have a couple of tests that I still need to write around less common cases (nesting There would be at least two follow-up PRs:
|
kdmccormick
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking really solid so far. Will continue reviewing tonight.
| We have one unusual convention here, which is that if we have a | ||
| DraftChangeLogRecord where the old_version == new_version, it means that a | ||
| Draft's defined version hasn't changed, but the data associated with the | ||
| Draft has changed because some other entity has changed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- The current wording almost reads as if it's an accident or an edge case that old_version will sometimes equal new_version. I think that's unfair; it's really a solid model of what's happening and intuitively makes sense once you grok the whole system. Here's a suggested rewording:
| We have one unusual convention here, which is that if we have a | |
| DraftChangeLogRecord where the old_version == new_version, it means that a | |
| Draft's defined version hasn't changed, but the data associated with the | |
| Draft has changed because some other entity has changed. | |
| Changes often take form of a direct change to the content of the entity, which | |
| result in a bump of that entity's version, and thus new_version > old_version. | |
| However, this will not always be the case: if the data associated with the Draft | |
| has changed purely as a side effect of some other entity changing, then this will | |
| be represented here as a change log record where new_version == old_version. |
- Clarifying question: There will be instances where new_version > old_version, and also there's a side-effect record, right? As an example, any time an import includes parent-child relationships, I expect that a child and its parent can both have content changes. In other words:
new_version==old_version implies side-effect, but not the other way around.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Clarifying question:
OK, "Scenario 2" in your comment below confirms this very clearly. Cool!
| draft_change_log = models.ForeignKey( | ||
| DraftChangeLog, | ||
| on_delete=models.CASCADE, | ||
| related_name="records", | ||
| ) | ||
| entity = models.ForeignKey(PublishableEntity, on_delete=models.RESTRICT) | ||
| old_version = models.ForeignKey( | ||
| PublishableEntityVersion, | ||
| on_delete=models.RESTRICT, | ||
| null=True, | ||
| blank=True, | ||
| related_name="+", | ||
| ) | ||
| new_version = models.ForeignKey( | ||
| PublishableEntityVersion, on_delete=models.RESTRICT, null=True, blank=True | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm reading the on_deletes correctly: deleting the changelog row will delete all its records, but deleting any PEs or PEVs which are actively referenced by a changelog record is disallowed. As an implication, it seems that pruning any given PEV becomes contingent upon first pruning the changelog records associated with it. Is that all as intended?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kdmccormick and I talked about this a bit offline (notes here), but the upshot is that I'm going to keep this for now and re-evaluate how important pruning is going to be down the line. We had a couple of thoughts around hybrid pruning where the publishing models stay but other things are removed, as well as deeper history pruning that could remove old (and less interesting) things from the DraftChangeLog.
| old_version = models.ForeignKey( | ||
| PublishableEntityVersion, | ||
| on_delete=models.RESTRICT, | ||
| null=True, | ||
| blank=True, | ||
| related_name="+", | ||
| ) | ||
| new_version = models.ForeignKey( | ||
| PublishableEntityVersion, on_delete=models.RESTRICT, null=True, blank=True | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Confirming my understanding--let me know if these are wrong. These might be useful as comments on this model.
- It's valid for multiple DraftChangeLogRecords to exist with the same (entity, old_version, new_version), as long as draft_change_log is distinct for each one. For example, if a user is repeatedly editing a container within a unit U @ v1, we will have a series of DraftChangeLogRecords
(U.v1 -> U.v1), (U.v1 -> U.v1), ... , (U.v1 -> U.v1). - We cannot assume new_version >= old_version, because discarding changes will be modelled as setting the Draft pointer to an older version--specifically, the last-published version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both of those statements are correct. I'll add comments for them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I rewrote a lot of the docstring for DraftChangeLogRecord in order to illustrate these and other possible scenarios.
| _create_container_side_effects_for_draft_change(change) | ||
|
|
||
|
|
||
| @set_draft_version.register(int) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| @set_draft_version.register(int) | |
| @set_draft_version.register |
Looks like you can omit (int) this since the first argument is type-annotated..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why, but it doesn't infer/dispatch properly if I don't include (int). A bunch of the tests break with error messages that show that it called the base version expecting a Draft with an int argument, e.g.:
tx_context = atomic() if create_transaction else nullcontext()
with tx_context:
> old_version_id = draft.version_id
E AttributeError: 'int' object has no attribute 'version_id'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Weird, gotcha. I might poke at this if I have time, but no need to block on it.
|
Self note: It's a super-edge case, but if someone uses one Possible remedy: Make it so that our callback to generate side-effects first prunes out the entries that look like they would end up as side-effects, but which can't be because we haven't generated them yet. |
|
A lot of our API methods, like I'm wondering if we can auto-set the |
|
|
||
| class DraftSideEffect(models.Model): | ||
| """ | ||
| Model to track when a change in one Draft affects other Drafts. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This phrasing is confusing to me (is it accurate? because neither cause nor effect have any relationship to the Draft model that I can see). Could we say something like "Model to track when a draft change to an entity implicitly affects other entities such as parent containers" ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll change the wording. I was thinking about it in the sense that the act of changing what Draft.version points to (by calling set_draft_version, e.g. a component going from version 1 to version 2) affects another thing a different row's Draft.version points to (the container).
Yeah, I have an edx-platform branch where I added
Yeah, that sounds useful. I'll play around with it. |
|
@kdmccormick, @bradenmacdonald: This should be in a fully reviewable state now. Two high level things:
I have a companion edx-platform PR that I need to rebase and update (it mostly just passes a few extra args, like who is resetting something). |
|
The edx-platform counterpart to this PR is: openedx/edx-platform#36513 |
|
Rebasing this now (need to account for the new migration added to the publishing app). |
…l enough for anything else at the moment
| Each publishable entity that is edited in this context will be tied to a | ||
| single DraftChangeLogRecord, representing the cumulative changes made to | ||
| that entity. Upon closing of the context, side effects of these changes will | ||
| be calcuated, which may result in more DraftChangeLogRecords being created | ||
| or updated. The resulting DraftChangeLogRecords and DraftChangeSideEffects | ||
| will be tied together into a single DraftChangeLog, representing the | ||
| collective changes to the learning package that happened in this context. | ||
| All changes will be committed in a single atomic transaction. | ||
| Example:: | ||
| with bulk_draft_changes_for(learning_package.id): | ||
| for section in course: | ||
| update_section_drafts(learning_package.id, section) | ||
| If you make a change to an entity *without* using this context manager, then | ||
| the individual change (and its side effects) will be automatically wrapped | ||
| in a one-off change context. For example, this:: | ||
| update_one_component(component.learning_package, component) | ||
| is identical to this:: | ||
| with bulk_draft_changes_for(component.learning_package.id): | ||
| update_one_component(component.learning_package.id, component) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ormsbee @bradenmacdonald About to merge with this new docstring. Lmk if you have any suggested edits, or we can edit it later if I merge before you get a chance to review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
|
@kdmccormick: Thank you for pushing this PR over the line! |
New Draft change-tracking models
This introduces
DraftChangeLogandDraftChangeLogRecord, which are mostly draft equivalents of thePublishLogandPublishLogRecord. ADraftChangeLogentry is created for every group of changes (e.g. an import or a reset), andDraftChangeLogRecordhas a record for every individual publishable entity that was changed.The motivation for these models:
PublishableEntityVersionsare created.Side-effects
It also introduces a new model
DraftSideEffect, which should also have an equivalentPublishSideEffect. This is to capture the idea that sometimes a change in one publishable entity will affect another one, even we don't explicitly create a new version of the affected entity.For instance, we define containers to have unpinned references to their children. When a Unit is defined this way, the Unit's version is only updated when the Unit's own metadata (e.g. its title) changes, or when it adds, removes, or reorders some of its children. The Unit's version does not increment when a child Component is updated with new edits. However the Unit is still affected, and is still logically part of the change or publish.
So every time a child of a container is modified, its container will be represented in the corresponding
DraftChangeLogorPublishLog. In the case where only the child has been edited, the container'sDraftChangeLogRecordorPublishLogRecordwill show the same version for itsold_versionandnew_versionfields. This brings our backend more in line with user expectations, e.g. the Unit will be "published" whenever one of its Components is "published", even if that publish doesn't change the metadata we store for the definition of the Unit itself.For now, the only planned side-effects are that changes in child elements affect their parent containers. However, the
DraftSideEffectmodel could be used more broadly. For instance, if multipleLTIBlocksrelied on some shared common configuration, we could make it so that changing that configuration caused side-effects to be written out that affect the appropriate LTI blocks.That being said, we're going to want to be extremely thoughtful about when and where else we might apply this. For instance, we could use this to model inheritance, but (a) that would lead to an explosion in writes; and (b) that would not correlate to what users intuitively expect (e.g. you don't expect setting a due date on a subsection to count as an "update" of all the problems inside, because due dates are not something you define at the problem level--the fact that it's read by problems through inheritance is an implementation quirk).