Skip to content

Conversation

@dwong2708
Copy link
Contributor

@dwong2708 dwong2708 commented Aug 21, 2025

Resolves: #358

Context

When mapping identifiers directly to the file system, we run into several blockers:

  • Identifiers are case-sensitive, while most operating systems are not case-sensitive, which can lead to conflicts.
  • Using raw identifiers may also introduce invalid or ambiguous filenames.

Proposed Solution

Adopt a hybrid filename strategy that combines:

  • Slugify → ensures readability and safe characters.
  • Short hash → ensures uniqueness and avoids collisions.

Example Mapping

My:Component → my_component_a12f4c.toml
My/Component → my_component_b91d0e.toml

This approach balances human readability with system safety.

Acceptance Criteria

  • Filenames must always be unique, even when identifiers differ only by case.
  • Filenames must be safe for all supported filesystems (no invalid or reserved characters).
  • Filenames must be lowercase for consistency.
  • Filenames must include a short hash suffix to avoid collisions.
  • The slugified portion of the filename should remain human-readable.
  • The mapping function should be deterministic (same input → same output).

@openedx-webhooks openedx-webhooks added the open-source-contribution PR author is not from Axim or 2U label Aug 21, 2025
@openedx-webhooks
Copy link

openedx-webhooks commented Aug 21, 2025

Thanks for the pull request, @dwong2708!

This repository is currently maintained by @axim-engineering.

Once you've gone through the following steps feel free to tag them in a comment and let them know that your changes are ready for engineering review.

🔘 Get product approval

If you haven't already, check this list to see if your contribution needs to go through the product review process.

  • If it does, you'll need to submit a product proposal for your contribution, and have it reviewed by the Product Working Group.
    • This process (including the steps you'll need to take) is documented here.
  • If it doesn't, simply proceed with the next step.
🔘 Provide context

To help your reviewers and other members of the community understand the purpose and larger context of your changes, feel free to add as much of the following information to the PR description as you can:

  • Dependencies

    This PR must be merged before / after / at the same time as ...

  • Blockers

    This PR is waiting for OEP-1234 to be accepted.

  • Timeline information

    This PR must be merged by XX date because ...

  • Partner information

    This is for a course on edx.org.

  • Supporting documentation
  • Relevant Open edX discussion forum threads
🔘 Get a green build

If one or more checks are failing, continue working on your changes until this is no longer the case and your build turns green.

Details
Where can I find more information?

If you'd like to get more details on all aspects of the review process for open source pull requests (OSPRs), check out the following resources:

When can I expect my changes to be merged?

Our goal is to get community contributions seen and reviewed as efficiently as possible.

However, the amount of time that it takes to review and merge a PR can vary significantly based on factors such as:

  • The size and impact of the changes that it introduces
  • The need for product review
  • Maintenance status of the parent repository

💡 As a result it may take up to several weeks or months to complete a review and merge your PR.

@github-project-automation github-project-automation bot moved this to Needs Triage in Contributions Aug 21, 2025
@dwong2708 dwong2708 marked this pull request as ready for review August 22, 2025 16:08
@dwong2708 dwong2708 requested a review from ormsbee August 22, 2025 16:18
Copy link
Contributor

@ormsbee ormsbee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the quick turnaround on this! I have a few low-level requests. At a higher level, please also include a test showing what happens in potential identifier name collision, e.g. two identifiers that differ only by case.

@dwong2708 dwong2708 requested a review from ormsbee August 22, 2025 19:24
Copy link
Contributor

@ormsbee ormsbee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few small requests. Thank you!


def test_slugify_hashed_filename_special_chars(self):
# Test the slugify_hashed_filename function with special characters
self.assertEqual(slugify_hashed_filename("my@ex#ample!"), "myexample_3366b5")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "/" and ":" characters are also likely to show up in identifiers at some point (they already do so for components), so please test for those characters in particular.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applied. Thanks

Comment on lines 35 to 37
self.assertEqual(slugify_hashed_filename("mY_eXamPle"), "my_example_d28c02")
self.assertEqual(slugify_hashed_filename("My_ExAmPlE"), "my_example_79232e")
self.assertEqual(slugify_hashed_filename("mY_EXAMPLE"), "my_example_b91dc0")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you're actually testing anything different with these last few examples?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I was just curious about different letter case scenarios. However, I changed it to only one check.

- Append a short hash for uniqueness.
- Result: human-readable but still unique and filesystem-safe filename.
"""
slug = slugify(identifier)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'll want to allow identifiers to use Unicode chars, as long as they are legal on the filesystem, i.e. we should call slugify(identifier, unicode=True)

Please add a test for that as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@dwong2708 dwong2708 requested a review from ormsbee August 25, 2025 17:48
@ormsbee ormsbee merged commit e01237a into openedx:main Aug 25, 2025
11 checks passed
@github-project-automation github-project-automation bot moved this from Needs Triage to Done in Contributions Aug 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

open-source-contribution PR author is not from Axim or 2U

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

Implement new identifier keys for file names

3 participants