Skip to content

Conversation

@dwong2708
Copy link
Contributor

@dwong2708 dwong2708 commented Aug 15, 2025

Description

Resolves: #352

This change updates the lp_dump file to serialize only the draft and publishable versions, instead of all available versions.

In addition, TOML component files are now excluded from the top level of the entities directory.

Static files remain out of scope for this change.


Directory Output Example

Given these components:

  • my_published_example
    • Versions: v1, v2
    • Draft version: v2
    • Published version: v1
  • my_draft_example
    • Versions: v1, v2
    • Draft version: v2
    • Published version: None

Dump File Output

Dump file output example

In the lp_dump file, only the draft and publishable versions are serialized, rather than all versions. Static files remain out of scope for this change.
@openedx-webhooks openedx-webhooks added the open-source-contribution PR author is not from Axim or 2U label Aug 15, 2025
@openedx-webhooks
Copy link

openedx-webhooks commented Aug 15, 2025

Thanks for the pull request, @dwong2708!

This repository is currently maintained by @axim-engineering.

Once you've gone through the following steps feel free to tag them in a comment and let them know that your changes are ready for engineering review.

🔘 Get product approval

If you haven't already, check this list to see if your contribution needs to go through the product review process.

  • If it does, you'll need to submit a product proposal for your contribution, and have it reviewed by the Product Working Group.
    • This process (including the steps you'll need to take) is documented here.
  • If it doesn't, simply proceed with the next step.
🔘 Provide context

To help your reviewers and other members of the community understand the purpose and larger context of your changes, feel free to add as much of the following information to the PR description as you can:

  • Dependencies

    This PR must be merged before / after / at the same time as ...

  • Blockers

    This PR is waiting for OEP-1234 to be accepted.

  • Timeline information

    This PR must be merged by XX date because ...

  • Partner information

    This is for a course on edx.org.

  • Supporting documentation
  • Relevant Open edX discussion forum threads
🔘 Get a green build

If one or more checks are failing, continue working on your changes until this is no longer the case and your build turns green.

Details
Where can I find more information?

If you'd like to get more details on all aspects of the review process for open source pull requests (OSPRs), check out the following resources:

When can I expect my changes to be merged?

Our goal is to get community contributions seen and reviewed as efficiently as possible.

However, the amount of time that it takes to review and merge a PR can vary significantly based on factors such as:

  • The size and impact of the changes that it introduces
  • The need for product review
  • Maintenance status of the parent repository

💡 As a result it may take up to several weeks or months to complete a review and merge your PR.

Copy link
Contributor

@ormsbee ormsbee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple of requests. I had one other high level concern though: one of the issues with export is going to be performance. In particular, we're going to want to pull as much of this stuff back in one query as possible while doing iteration across all the publishable entities.

So for instance, grabbing all publishable entities in this case should include doing select related for the draft version and published version, so that we're not doing n+1 queries for those.

I think it's okay to do a separate query for each ComponentVersion for now, though this is something I also think we'll eventually want to further optimize.

Can you please make sure the correct select_relateds are happening to make it so that the for loop through PublishableEntities doesn't make extra queries?

Thank you.

Comment on lines 111 to 115
current_draft: Optional[Draft] = getattr(entity, "draft", None)
current_published: Optional[Published] = getattr(entity, "published", None)

draft_version: Optional[PublishableEntityVersion] = getattr(current_draft, "version", None)
published_version: Optional[PublishableEntityVersion] = getattr(current_published, "version", None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created an issue for making it so that this code isn't necessary: #362

But I came to realize that doing that would introduce more problems.

Then I thought to ask you to please replace this with calls to get_published_version and get_draft_version, but I realize that those methods are actually going to be really inefficient for this iteration, because they query the Published and Draft tables directly, instead of using potentially pre-loaded data.

So my request for this PR is:

  1. Please use get_published_version and get_draft_version here, but...
  2. Please modify get_published_version and get_draft_version so that they can take either the integer primary key (as they do today), or the PublishableEntity model object. You can see an example of this sort of pattern in set_draft_version.

If a PublishableEntity model is passed into those functions, the code should operate very much like how you have here, where it checks for whether the attribute exists and returns None if there is no current published/draft version.

@dwong2708
Copy link
Contributor Author

Just a couple of requests. I had one other high level concern though: one of the issues with export is going to be performance. In particular, we're going to want to pull as much of this stuff back in one query as possible while doing iteration across all the publishable entities.

So for instance, grabbing all publishable entities in this case should include doing select related for the draft version and published version, so that we're not doing n+1 queries for those.

I think it's okay to do a separate query for each ComponentVersion for now, though this is something I also think we'll eventually want to further optimize.

Can you please make sure the correct select_relateds are happening to make it so that the for loop through PublishableEntities doesn't make extra queries?

Thank you.

All suggestions have been addressed. Thanks a lot for the helpful input!

@dwong2708 dwong2708 requested a review from ormsbee August 21, 2025 03:46
Copy link
Contributor

@ormsbee ormsbee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good. Just some low level requests. Thank you!

Comment on lines 277 to 279
"component",
"container",
"component__component_type",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Things in the publishing app shouldn't show an awareness of components, because that's a higher-level abstraction. Also, we're eventually going to pull containers out of publishing, so please remove that as well. If we need those for performance reasons, please add those select_related from the calling app (since we can chain select_related calls together).

Copy link
Contributor Author

@dwong2708 dwong2708 Aug 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood. Changes applied. Thanks

Comment on lines 447 to 448
# The following code retrieves the draft version for a given PublishableEntity.
# Useful for preloading the draft version in certain contexts.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the most useful part of this is that it gracefully handles the edge cases when there is no draft version, not really pre-loading.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

# The following code retrieves the draft version for a given PublishableEntity.
# Useful for preloading the draft version in certain contexts.
draft: Optional[Draft] = getattr(publishable_entity_id, "draft", None)
return getattr(draft, "version", None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please explicitly check to see if draft is None and then return draft.version if it's not. Something like:

    draft: Optional[Draft] = getattr(publishable_entity_or_id, "draft", None)
    if draft is None:
        return None
    return draft.version

I know it's basically equivalent at the moment, but getattr is a dangerous tool that I think we should minimize our use of. We have no choice but to use it when getting the draft because that's just how Django 1:1 relations work. But if for some terrible reason we decided to change the attribute from draft.version to draft.current, we want it to explode in an error and not silently return None.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!, I get it now. Thanks.

@dwong2708 dwong2708 requested a review from ormsbee August 21, 2025 17:59
Copy link
Contributor

@ormsbee ormsbee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very minor question/suggestion.

Comment on lines 71 to 77
api.create_component_version(
cls.published_component.pk,
version_num=cls.published_component.versioning.draft.version_num + 1,
title="My published problem draft v2",
created=cls.now,
created_by=cls.user.id,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This use case comes up so often that we made a create_next_component_version() function for it. Please use that instead.

Comment on lines 1154 to 1159
for e in entities:
draft = getattr(e, 'draft', None)
published = getattr(e, 'published', None)

_ = getattr(draft, 'version', None)
_ = getattr(published, 'version', None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not clear whether we expect the draft/published versions to exist or not. Please make these regular access calls to draft.version and published.version and do something like check an attribute's value, e.g. assert draft.version.version_num == 2

@ormsbee ormsbee merged commit bddb781 into openedx:main Aug 21, 2025
11 checks passed
@github-project-automation github-project-automation bot moved this from Needs Triage to Done in Contributions Aug 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

open-source-contribution PR author is not from Axim or 2U

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

What versions need to be exported?

3 participants