Skip to content

Conversation

@ormsbee
Copy link
Contributor

@ormsbee ormsbee commented Mar 19, 2025

New Draft change-tracking models

This introduces DraftChangeLog and DraftChangeLogRecord, which are mostly draft equivalents of the PublishLog and PublishLogRecord. A DraftChangeLog entry is created for every group of changes (e.g. an import or a reset), and DraftChangeLogRecord has a record for every individual publishable entity that was changed.

The motivation for these models:

  1. Batch changes into logical groupings, e.g. "discard changes in library" or "import a course's content into this library".
  2. Accurate history reconstruction: We don't currently track reset-to-published operations anywhere, so we can't completely faithfully reconstruct historical draft information based purely off of the timestamps of when PublishableEntityVersions are created.

Side-effects

It also introduces a new model DraftSideEffect, which should also have an equivalent PublishSideEffect. This is to capture the idea that sometimes a change in one publishable entity will affect another one, even we don't explicitly create a new version of the affected entity.

For instance, we define containers to have unpinned references to their children. When a Unit is defined this way, the Unit's version is only updated when the Unit's own metadata (e.g. its title) changes, or when it adds, removes, or reorders some of its children. The Unit's version does not increment when a child Component is updated with new edits. However the Unit is still affected, and is still logically part of the change or publish.

So every time a child of a container is modified, its container will be represented in the corresponding DraftChangeLog or PublishLog. In the case where only the child has been edited, the container's DraftChangeLogRecord or PublishLogRecord will show the same version for its old_version and new_version fields. This brings our backend more in line with user expectations, e.g. the Unit will be "published" whenever one of its Components is "published", even if that publish doesn't change the metadata we store for the definition of the Unit itself.

For now, the only planned side-effects are that changes in child elements affect their parent containers. However, the DraftSideEffect model could be used more broadly. For instance, if multiple LTIBlocks relied on some shared common configuration, we could make it so that changing that configuration caused side-effects to be written out that affect the appropriate LTI blocks.

That being said, we're going to want to be extremely thoughtful about when and where else we might apply this. For instance, we could use this to model inheritance, but (a) that would lead to an explosion in writes; and (b) that would not correlate to what users intuitively expect (e.g. you don't expect setting a due date on a subsection to count as an "update" of all the problems inside, because due dates are not something you define at the problem level--the fact that it's read by problems through inheritance is an implementation quirk).

Comment on lines 530 to 549
@set_draft_version.register(int)
def _(
publishable_entity_id: int,
publishable_entity_version_pk: int | None,
/,
set_at: datetime | None = None,
set_by: int | None = None, # User.id
create_transaction: bool = True,
) -> None:
"""
Alias for set_draft_version taking PublishableEntity.id instead of a Draft.
"""
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wanted to call out that this is the first time I'm introducing this pattern into openedx-learning (or really anywhere in Open edX code that I'm aware of), where we use functools.singledispatch to give multiple versions of the same function and switch based on the first parameter type. Wanted to get people's thoughts.

Copy link
Member

@kdmccormick kdmccormick Apr 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very very cool. I like this a lot more than taking a big Union of types and then having to handle them with a bunch of if statements at the top of the function. It's also nice because it encourages (a) static typing and (b) having one canonical version of function, and then one or more "aliases" which just lightly wrap the canonical verison.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that it encourages static typing, but it seems to make the code longer and slightly harder to follow compared to just having different names for the different versions of each function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, a major knock against it is that it seems to deeply confuse Pylance, making it so that the auto-complete is useless. The more I read about singledispatch and its handling in various tools, the more it seems to fall in the category of "this is a weird thing that has to be special cased everywhere". I'm leaning towards actually making it "a big Union of types and then having to handle them with a bunch of if statements at the top of the function" at this point because of this. It's "ugly" but it's simple and tooling-friendly.

if set_at is None:
set_at = datetime.now(tz=timezone.utc)

tx_context = atomic() if create_transaction else nullcontext()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New convention: optionally creating a transaction, based on passed in parameters. This is to avoid unnecessarily opening transactions when we're being invoked from things that already open transactions.

Comment on lines +24 to +35
with bulk_draft_changes_for(learning_package.id):
for section in course:
update_section_drafts(learning_package_id, section)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another new/different thing I'm doing in this PR: Using a context manager to allow parts of the publishing API to access the active DraftChangeLog.

@@ -1,95 +0,0 @@
"""
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I split the two models in this module into draft_log.py and publish_log.py

Comment on lines 729 to 779
active_change_log = DraftChangeLogContext.get_active_draft_change_log(learning_package_id)

# If there's an active DraftChangeLog, we're already in a transaction, so
# there's no need to open a new one.
if active_change_log:
tx_context = nullcontext()
else:
tx_context = bulk_draft_changes_for(
learning_package_id, changed_at=reset_at, changed_by=reset_by
)
Copy link
Contributor Author

@ormsbee ormsbee Mar 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A more extreme version of optional transaction creation. Before this, I was adding a lot of redundant logic in reset_drafts_to_published to avoid the overhead of calling set_draft_version a bunch of times.

verbose_name_plural = "Publish Log Records"


class Published(models.Model):
Copy link
Contributor Author

@ormsbee ormsbee Mar 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't make any changes to this model beyond moving it.

from .publishable_entity import PublishableEntity, PublishableEntityVersion


class Draft(models.Model):
Copy link
Contributor Author

@ormsbee ormsbee Mar 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't make any changes to this model beyond moving it.

@ormsbee ormsbee force-pushed the draft_log2 branch 2 times, most recently from bca7747 to 0ac4531 Compare March 31, 2025 22:10
@ormsbee ormsbee marked this pull request as ready for review April 1, 2025 16:39
@ormsbee
Copy link
Contributor Author

ormsbee commented Apr 1, 2025

I have a couple of tests that I still need to write around less common cases (nesting bulk_draft_changes_for calls and side-effect calculation when there are multiple layers of containers). But I'm not likely to get to that until tonight, and I'd like to get eyes on other parts of this PR sooner if possible.

There would be at least two follow-up PRs:

  • a small edx-platform one to send the user information to some calls, e.g. ("who did this soft-delete")
  • one in openedx-learning that introduces a PublishSideEffect analog to DraftSideEffect.

@kdmccormick kdmccormick self-requested a review April 1, 2025 19:31
Copy link
Member

@kdmccormick kdmccormick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking really solid so far. Will continue reviewing tonight.

Comment on lines 92 to 95
We have one unusual convention here, which is that if we have a
DraftChangeLogRecord where the old_version == new_version, it means that a
Draft's defined version hasn't changed, but the data associated with the
Draft has changed because some other entity has changed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. The current wording almost reads as if it's an accident or an edge case that old_version will sometimes equal new_version. I think that's unfair; it's really a solid model of what's happening and intuitively makes sense once you grok the whole system. Here's a suggested rewording:
Suggested change
We have one unusual convention here, which is that if we have a
DraftChangeLogRecord where the old_version == new_version, it means that a
Draft's defined version hasn't changed, but the data associated with the
Draft has changed because some other entity has changed.
Changes often take form of a direct change to the content of the entity, which
result in a bump of that entity's version, and thus new_version > old_version.
However, this will not always be the case: if the data associated with the Draft
has changed purely as a side effect of some other entity changing, then this will
be represented here as a change log record where new_version == old_version.
  1. Clarifying question: There will be instances where new_version > old_version, and also there's a side-effect record, right? As an example, any time an import includes parent-child relationships, I expect that a child and its parent can both have content changes. In other words: new_version==old_version implies side-effect, but not the other way around.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Clarifying question:

OK, "Scenario 2" in your comment below confirms this very clearly. Cool!

Comment on lines +112 to +203
draft_change_log = models.ForeignKey(
DraftChangeLog,
on_delete=models.CASCADE,
related_name="records",
)
entity = models.ForeignKey(PublishableEntity, on_delete=models.RESTRICT)
old_version = models.ForeignKey(
PublishableEntityVersion,
on_delete=models.RESTRICT,
null=True,
blank=True,
related_name="+",
)
new_version = models.ForeignKey(
PublishableEntityVersion, on_delete=models.RESTRICT, null=True, blank=True
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm reading the on_deletes correctly: deleting the changelog row will delete all its records, but deleting any PEs or PEVs which are actively referenced by a changelog record is disallowed. As an implication, it seems that pruning any given PEV becomes contingent upon first pruning the changelog records associated with it. Is that all as intended?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kdmccormick and I talked about this a bit offline (notes here), but the upshot is that I'm going to keep this for now and re-evaluate how important pruning is going to be down the line. We had a couple of thoughts around hybrid pruning where the publishing models stay but other things are removed, as well as deeper history pruning that could remove old (and less interesting) things from the DraftChangeLog.

Comment on lines +118 to +203
old_version = models.ForeignKey(
PublishableEntityVersion,
on_delete=models.RESTRICT,
null=True,
blank=True,
related_name="+",
)
new_version = models.ForeignKey(
PublishableEntityVersion, on_delete=models.RESTRICT, null=True, blank=True
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirming my understanding--let me know if these are wrong. These might be useful as comments on this model.

  1. It's valid for multiple DraftChangeLogRecords to exist with the same (entity, old_version, new_version), as long as draft_change_log is distinct for each one. For example, if a user is repeatedly editing a container within a unit U @ v1, we will have a series of DraftChangeLogRecords (U.v1 -> U.v1), (U.v1 -> U.v1), ... , (U.v1 -> U.v1).
  2. We cannot assume new_version >= old_version, because discarding changes will be modelled as setting the Draft pointer to an older version--specifically, the last-published version.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both of those statements are correct. I'll add comments for them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rewrote a lot of the docstring for DraftChangeLogRecord in order to illustrate these and other possible scenarios.

_create_container_side_effects_for_draft_change(change)


@set_draft_version.register(int)
Copy link
Member

@kdmccormick kdmccormick Apr 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
@set_draft_version.register(int)
@set_draft_version.register

Looks like you can omit (int) this since the first argument is type-annotated..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why, but it doesn't infer/dispatch properly if I don't include (int). A bunch of the tests break with error messages that show that it called the base version expecting a Draft with an int argument, e.g.:

        tx_context = atomic() if create_transaction else nullcontext()
    
        with tx_context:
>           old_version_id = draft.version_id
E           AttributeError: 'int' object has no attribute 'version_id'

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Weird, gotcha. I might poke at this if I have time, but no need to block on it.

@ormsbee
Copy link
Contributor Author

ormsbee commented Apr 2, 2025

Self note: It's a super-edge case, but if someone uses one bulk_draft_changes_for to add and then delete an entity, we'd get a (old_version, new_version) entry of (None, None), which doesn't really mean anything because the Draft state was the same before and after and it was not a side-effect. Another weird edge case is if someone edits an entry to make it go from v1 -> v2 and then sets the draft version back to v1. In which case, we'd get (v1, v1), but again, with no side-effect.

Possible remedy: Make it so that our callback to generate side-effects first prunes out the entries that look like they would end up as side-effects, but which can't be because we haven't generated them yet.

@bradenmacdonald
Copy link
Contributor

A lot of our API methods, like create_unit_version() as one random example, accept an optional parameter like created_by: int | None = None,. But because it's optional, I've noticed that we haven't been very thorough in populating those fields when using Learning Core APIs within edx-platform. e.g. many of these libraries APIs don't even accept a user parameter so don't pass anything in to Learning Core.

I'm wondering if we can auto-set the created_by and similar fields from the current DraftChangeLogContext changed_by value. We'd still have a problem of making sure edx-platform explicitly creates a DraftChangeLogContext with the current user set, but there'd be a lot less manual passing of user IDs around through each level. Maybe the same for the changed_at values too.


class DraftSideEffect(models.Model):
"""
Model to track when a change in one Draft affects other Drafts.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This phrasing is confusing to me (is it accurate? because neither cause nor effect have any relationship to the Draft model that I can see). Could we say something like "Model to track when a draft change to an entity implicitly affects other entities such as parent containers" ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll change the wording. I was thinking about it in the sense that the act of changing what Draft.version points to (by calling set_draft_version, e.g. a component going from version 1 to version 2) affects another thing a different row's Draft.version points to (the container).

@ormsbee
Copy link
Contributor Author

ormsbee commented Apr 3, 2025

A lot of our API methods, like create_unit_version() as one random example, accept an optional parameter like created_by: int | None = None,. But because it's optional, I've noticed that we haven't been very thorough in populating those fields when using Learning Core APIs within edx-platform. e.g. many of these libraries APIs don't even accept a user parameter so don't pass anything in to Learning Core.

Yeah, I have an edx-platform branch where I added created_by in a number of places.

I'm wondering if we can auto-set the created_by and similar fields from the current DraftChangeLogContext changed_by value. We'd still have a problem of making sure edx-platform explicitly creates a DraftChangeLogContext with the current user set, but there'd be a lot less manual passing of user IDs around through each level. Maybe the same for the changed_at values too.

Yeah, that sounds useful. I'll play around with it.

@ormsbee
Copy link
Contributor Author

ormsbee commented Apr 9, 2025

@kdmccormick, @bradenmacdonald: This should be in a fully reviewable state now. Two high level things:

  1. I'm punting on implicitly setting created_at/created_by. I agree that it'd be a nice use of the context, but this PR is already larger than I'm comfortable with.
  2. I got rid of the singledispatch call and used isinstance for switching between Draft objects and id. It's not as elegant, but it doesn't mess up any of the tooling, and I think it's easier for people to understand.

I have a companion edx-platform PR that I need to rebase and update (it mostly just passes a few extra args, like who is resetting something).

@ormsbee
Copy link
Contributor Author

ormsbee commented Apr 11, 2025

The edx-platform counterpart to this PR is: openedx/edx-platform#36513

@ormsbee
Copy link
Contributor Author

ormsbee commented Apr 11, 2025

Rebasing this now (need to account for the new migration added to the publishing app).

Comment on lines +1423 to +1447
Each publishable entity that is edited in this context will be tied to a
single DraftChangeLogRecord, representing the cumulative changes made to
that entity. Upon closing of the context, side effects of these changes will
be calcuated, which may result in more DraftChangeLogRecords being created
or updated. The resulting DraftChangeLogRecords and DraftChangeSideEffects
will be tied together into a single DraftChangeLog, representing the
collective changes to the learning package that happened in this context.
All changes will be committed in a single atomic transaction.
Example::
with bulk_draft_changes_for(learning_package.id):
for section in course:
update_section_drafts(learning_package.id, section)
If you make a change to an entity *without* using this context manager, then
the individual change (and its side effects) will be automatically wrapped
in a one-off change context. For example, this::
update_one_component(component.learning_package, component)
is identical to this::
with bulk_draft_changes_for(component.learning_package.id):
update_one_component(component.learning_package.id, component)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ormsbee @bradenmacdonald About to merge with this new docstring. Lmk if you have any suggested edits, or we can edit it later if I merge before you get a chance to review.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@kdmccormick kdmccormick merged commit 443c3d6 into openedx:main Apr 16, 2025
11 checks passed
@ormsbee ormsbee deleted the draft_log2 branch April 16, 2025 19:56
@ormsbee
Copy link
Contributor Author

ormsbee commented Apr 16, 2025

@kdmccormick: Thank you for pushing this PR over the line!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants