Skip to content

End-to-End Testing and Adjustments for Backup#375

Merged
ormsbee merged 6 commits intoopenedx:mainfrom
WGU-Open-edX:dwong2708/lp_dump_adjustments
Sep 10, 2025
Merged

End-to-End Testing and Adjustments for Backup#375
ormsbee merged 6 commits intoopenedx:mainfrom
WGU-Open-edX:dwong2708/lp_dump_adjustments

Conversation

@dwong2708
Copy link
Contributor

@dwong2708 dwong2708 commented Sep 5, 2025

Resolves: #374

PR Description

This PR focuses on testing the backup functionality for a learning package that contains all types of libraries. The goal is to validate the end-to-end dump process and apply any necessary adjustments based on the results.


Acceptance Criteria

  1. Test using one learning package that includes all types of libraries.
  2. Upload the resulting test output to this issue.
  3. Apply any required adjustments based on the test results.

Input Learning Package

All Content

All content

Collection Content

Collection content


Dump File

Test.zip V1

Test.zip V2

Test.zip V3

@openedx-webhooks
Copy link

openedx-webhooks commented Sep 5, 2025

Thanks for the pull request, @dwong2708!

This repository is currently maintained by @axim-engineering.

Once you've gone through the following steps feel free to tag them in a comment and let them know that your changes are ready for engineering review.

🔘 Get product approval

If you haven't already, check this list to see if your contribution needs to go through the product review process.

  • If it does, you'll need to submit a product proposal for your contribution, and have it reviewed by the Product Working Group.
    • This process (including the steps you'll need to take) is documented here.
  • If it doesn't, simply proceed with the next step.
🔘 Provide context

To help your reviewers and other members of the community understand the purpose and larger context of your changes, feel free to add as much of the following information to the PR description as you can:

  • Dependencies

    This PR must be merged before / after / at the same time as ...

  • Blockers

    This PR is waiting for OEP-1234 to be accepted.

  • Timeline information

    This PR must be merged by XX date because ...

  • Partner information

    This is for a course on edx.org.

  • Supporting documentation
  • Relevant Open edX discussion forum threads
🔘 Get a green build

If one or more checks are failing, continue working on your changes until this is no longer the case and your build turns green.

Details
Where can I find more information?

If you'd like to get more details on all aspects of the review process for open source pull requests (OSPRs), check out the following resources:

When can I expect my changes to be merged?

Our goal is to get community contributions seen and reviewed as efficiently as possible.

However, the amount of time that it takes to review and merge a PR can vary significantly based on factors such as:

  • The size and impact of the changes that it introduces
  • The need for product review
  • Maintenance status of the parent repository

💡 As a result it may take up to several weeks or months to complete a review and merge your PR.

@openedx-webhooks openedx-webhooks added the open-source-contribution PR author is not from Axim or 2U label Sep 5, 2025
@github-project-automation github-project-automation bot moved this to Needs Triage in Contributions Sep 5, 2025
@dwong2708 dwong2708 marked this pull request as ready for review September 6, 2025 02:25
@dwong2708 dwong2708 requested a review from ormsbee September 6, 2025 02:25
Comment on lines 162 to 170
if hasattr(version, 'containerversion'):
children_qs = (
version.containerversion.entity_list.entitylistrow_set
.order_by("entity__key")
.values_list("entity__key", flat=True)
.distinct()
)
children = list(children_qs)
container_table.add("children", children)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use an api module call here instead of iterating through the children like this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Thanks

@ormsbee
Copy link
Contributor

ormsbee commented Sep 8, 2025

A couple of things I've noticed with the test file:

  1. The folders in the zip file are being created with a modification timestamp of Dec. 31, 1979. Please correct this to be the time of creation for the rest of the archive.
  2. It would actually be really nice if the modified timestamp for all the resources reflected their actual timestamps in the system. So if the entity TOML file matched up with the modification timestamp of the most recently created version, and if each version's data matched up with the timestamp for when that version was created. Likewise, if the modification timestamp for the collection TOML file matched when the collection was last updated.

Copy link
Contributor

@ormsbee ormsbee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another thing that came up when I was looking at your output file is that the library itself already appends stuff to the key when new things are created. So while I think the hash code is still important to have for edge cases, I don't think it will be necessary most of time.

How about this?

  1. We still slugify all the identifiers for the purposes of normalizing case and getting rid of weird characters.
  2. We keep track of all the slugs we've written for filenames.
  3. If there's no naming conflict, we just write the slugs without the extra hashing.
  4. If there is a naming conflict, we append the identifier hash like before.

# Generate the slugified hash for the component local key
# Example: if the local key is "my_component", the slugified hash might be "my_component_123456"
# It's a combination of the local key and a hash and should be unique
entity_slugify_hash = slugify_hashed_filename(entity.key)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't do this, actually. Most of the key is already represented by the directory structure. Leaving it with the local_key here makes it easier to read.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Changes applied

- Assign timestamps to ZIP resources (folders and files):
  - Entity TOML files use the latest version timestamp
  - Other resources use the system timestamp
- Add new logic to define entity filenames:
  - Slugify all identifiers
  - Track all generated slugs
  - Use the slug directly if there is no naming conflict
  - If a conflict exists, fall back to a slugified hash of the version name
@dwong2708 dwong2708 requested a review from ormsbee September 9, 2025 00:52
@dwong2708
Copy link
Contributor Author

Thank you for your support, @ormsbee . I’ve applied the new logic for timestamp handling in zipfile resources and for filename generation.

Copy link
Contributor

@ormsbee ormsbee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of small requests. Thank you!

if isinstance(content, str):
content = content.encode("utf-8")
zip_file.writestr(file_info, content or b"")
else: # explicitly an empty folder
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't assume that paths without suffixes mean it's a directory. It's entirely possible to have a file named README or Makefile or something along those lines. Please keep the folder creation as a separate method.

Also, please make sure this code still works if people make arbitrary subdirectories inside the static assets folder of components, e.g. static/images/diagrams/figure1.png

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Thanks

Comment on lines 1479 to 1484
A list of entity keys for all entities in the container version, ordered by entity key.
"""
return list(
container_version.entity_list.entitylistrow_set
.values_list("entity__key", flat=True)
.order_by("entity__key")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Container children are ordered. This should be ordered by order_num.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applied

container_version.entity_list.entitylistrow_set
.values_list("entity__key", flat=True)
.order_by("entity__key")
.distinct()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having the same child multiple times is allowed (it's a little weird, but it's valid), so please remove the distinct().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applied

- Introduce `add_file_to_zip` and `add_folder_to_zip` for clarity
- Remove suffix-based directory detection (avoids misclassifying files like README or Makefile)
- Improves support for empty directories and arbitrary subdirectories
@dwong2708 dwong2708 requested a review from ormsbee September 9, 2025 19:18
@dwong2708
Copy link
Contributor Author

Thank you again, @ormsbee . The new adjustments are available in the Test.zip v3 file.

Copy link
Contributor

@ormsbee ormsbee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small request that was my fault for not catching it much earlier. Otherwise, I think this is good to merge. Thank you!

Comment on lines 92 to 93
draft_version: Optional[PublishableEntityVersion],
published_version: Optional[PublishableEntityVersion]) -> str:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll merge this anyway, but a nit here to prefer the newer notation of PublishableEntityVersion | None for annotations generally.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, I took the opportunity to change the values.

container_table.add("children", children)

unit_table = tomlkit.table()
unit_table.add("graded", True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, this is my fault that I missed this the first time -- there's actually no additional metadata on Units yet. In the original ticket, I was using this field as a hypothetical example of where we would put extra fields defined on specific container types, but there are actually no other fields defined on Unit yet. Please get rid of this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applied, thanks

@dwong2708 dwong2708 requested a review from ormsbee September 10, 2025 16:43
@ormsbee ormsbee merged commit 706c8bc into openedx:main Sep 10, 2025
11 checks passed
@github-project-automation github-project-automation bot moved this from Needs Triage to Done in Contributions Sep 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

open-source-contribution PR author is not from Axim or 2U

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

End-to-End Testing and Adjustments for Backup

3 participants

Comments