FC-73 Xqu add submission file (Feature: Submission File System - Enhanced File Management for Graded Submissions)#286
Conversation
|
Thanks for the pull request, @leoaulasneo98! This repository is currently maintained by Once you've gone through the following steps feel free to tag them in a comment and let them know that your changes are ready for engineering review. 🔘 Get product approvalIf you haven't already, check this list to see if your contribution needs to go through the product review process.
🔘 Provide contextTo help your reviewers and other members of the community understand the purpose and larger context of your changes, feel free to add as much of the following information to the PR description as you can:
🔘 Get a green buildIf one or more checks are failing, continue working on your changes until this is no longer the case and your build turns green. DetailsWhere can I find more information?If you'd like to get more details on all aspects of the review process for open source pull requests (OSPRs), check out the following resources: When can I expect my changes to be merged?Our goal is to get community contributions seen and reviewed as efficiently as possible. However, the amount of time that it takes to review and merge a PR can vary significantly based on factors such as:
💡 As a result it may take up to several weeks or months to complete a review and merge your PR. |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #286 +/- ##
==========================================
+ Coverage 94.43% 94.79% +0.36%
==========================================
Files 18 18
Lines 2263 2423 +160
Branches 95 99 +4
==========================================
+ Hits 2137 2297 +160
Misses 115 115
Partials 11 11
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Review should only start after PR #283 is merged. |
9d2c25b to
f7d55d7
Compare
8b556bf to
5f37f2e
Compare
5f37f2e to
b9ce70a
Compare
There was a problem hiding this comment.
Update author of cange
Add link to first ADR
Summarize and simplify.
submissions/models.py
Outdated
| """ | ||
| files_urls = {} | ||
| for filename, file_obj in files_dict.items(): | ||
| if not (isinstance(file_obj, (bytes, ContentFile, SimpleUploadedFile)) or hasattr(file_obj, 'read')): |
There was a problem hiding this comment.
Why check for hasattr(file_obj, 'read')?
submissions/models.py
Outdated
| logger.warning(f"Invalid file object type for {filename}") | ||
| continue | ||
|
|
||
| if hasattr(file_obj, 'read'): |
b9ce70a to
ee29b2e
Compare
angonz
left a comment
There was a problem hiding this comment.
Ok on my side. Please open for Axim's review.
ormsbee
left a comment
There was a problem hiding this comment.
Some comments and questions. I just want to make sure I'm understanding the intentions of this code properly. Thank you.
submissions/models.py
Outdated
| """ | ||
| Model to handle files associated with submissions | ||
| """ | ||
| uid = models.UUIDField(default=uuid4, editable=False) # legacy S3 key |
There was a problem hiding this comment.
| uid = models.UUIDField(default=uuid4, editable=False) # legacy S3 key | |
| uuid = models.UUIDField(default=uuid4, editable=False) # legacy S3 key |
A number of other platform models use "uuid" as the conventional name for this field.
submissions/models.py
Outdated
| return f"/{self.external_grader.queue_name}/{self.uid}" | ||
|
|
||
|
|
||
| class SubmissionFileManager: |
There was a problem hiding this comment.
It's confusing to have something called a "xxxxManager" class in a models.py file, when it isn't a Django ORM Manager. Can you make these into api.py functions instead?
There was a problem hiding this comment.
Hi Dave,
You're right about the naming confusion. Having a "xxxManager" class in models.py that isn't actually a Django ORM Manager is misleading and could cause confusion.
I've refactored this to follow a clearer naming convention:
Renamed SubmissionFileManager to SubmissionFileProcessor and move to api.py, which better describes its actual purpose - processing file operations for submissions rather than managing model queries.
Updated docstrings to reflect this change.
Kept the functionality encapsulated in a class since it maintains state (the external_grader reference) between operations.
This change preserves the existing functionality while making the code more intuitive.
submissions/models.py
Outdated
| This method handles various file object types that might be received from Open edX, including: | ||
| - Native Open edX FileObjForWebobFiles objects | ||
| - Bytes objects | ||
| - ContentFile objects | ||
| - SimpleUploadedFile objects | ||
| - Any object with a 'read' method |
There was a problem hiding this comment.
Why does this function need to support all these?
There was a problem hiding this comment.
Dave, I understand your point. I initially designed this thinking of a scalable solution for various workflows, but I've reconsidered based on the actual needs of the project.
I've simplified the implementation by using standard TypeError instead of custom exceptions and adjusting the validation to focus on the essentials: bytes or objects with a 'read()' method.
Thanks for your feedback that leads us to more straightforward code.
submissions/models.py
Outdated
| # Validate file object type | ||
| if not (isinstance(file_obj, (bytes, ContentFile, SimpleUploadedFile)) or hasattr(file_obj, 'read')): | ||
| logger.error(f"Invalid file object type for {filename}") | ||
| raise InvalidFileTypeError(f"Invalid file object type for {filename}") |
There was a problem hiding this comment.
Error should say what the type actually was. I also don't think it's really necessary to create a new error for this, since this is exactly what TypeError is for.
There was a problem hiding this comment.
You're right on both counts. I've updated the code, thanks for the feedback.
submissions/models.py
Outdated
| @property | ||
| def xqueue_url(self): | ||
| """ | ||
| Returns URL in xqueue format: /queue_name/uid |
There was a problem hiding this comment.
What does "xqueue format" mean?
There was a problem hiding this comment.
This is the format in which the xqueue-watcher receives the files. That is, the format in which the legacy XqueueServer delivers the files. I've corrected this to make it clearer.
submissions/models.py
Outdated
|
|
||
| class Meta: | ||
| indexes = [ | ||
| models.Index(fields=['external_grader', 'uid']), |
There was a problem hiding this comment.
Do we ever look up by just uuid?
There was a problem hiding this comment.
we don't currently look up by just uuid. The lookups are always through the external_grader relationship or using both fields. If we need direct uuid lookups in the future, we'll add a separate index for that field.
ebe7b70 to
4de24a8
Compare
ormsbee
left a comment
There was a problem hiding this comment.
High level security question: Am I understanding correctly that this is storing things in the default Django storages backend, and if so, does that mean it will be publicly readable (though obscured because of the randomly generated UUIDs)?
Hi @ormsbee , You can access a resource if you have its complete URL. If you wanted to implement a signature mechanism to restrict resource access, you would need to configure both the edX platform and the XqueueWatcher through the grader.
|
ormsbee
left a comment
There was a problem hiding this comment.
Please also include a version bump as part of this PR. Thank you.
submissions/api.py
Outdated
| except (IOError, OSError) as e: | ||
| logger.error(f"Error reading file {filename}: {e}") | ||
| raise FileProcessingError(f"Error reading file {filename}: {e}") from e | ||
| except UnicodeDecodeError as e: |
There was a problem hiding this comment.
The code above is saying that read() returns a bytes object, but the fact that it's capable of throwing a UnicodeDecodeError would imply that it can sometimes return a str.
There was a problem hiding this comment.
I get that you're trying to be permissive here to make this friendlier for use in the platform, but please just require that the files here be instances of Django's File object (including subclasses), and raise a TypeError if you find ones that are not. Let the caller worry about making sure the data exists there, and let it raise the exception when saving to SubmisssionFile if anything goes wrong. If someone has a bytes instance, it's easy enough for them to convert that to a ContentFile before calling this function. This "is it bytes or str or file" stuff is cluttering up the logic here without really giving much benefit, and bugs are more likely to slip in because of it.
submissions/api.py
Outdated
|
|
||
| # Convert bytes to ContentFile | ||
| # The read() method from file-like objects returns bytes, which we handle here | ||
| if isinstance(file_obj, bytes): |
There was a problem hiding this comment.
If file_obj is a str here, then we can pass through this without ever making it into a ContentFile.
1e49156 to
affcaae
Compare
b36719a to
d154a88
Compare
0836a82 to
40f2a30
Compare
|
@leoaulasneo98 could you please rebase and resolve conflicts? |
40f2a30 to
3ad53d6
Compare
Hi Sarina, good afternoon. Ready. |
ormsbee
left a comment
There was a problem hiding this comment.
Thank you for addressing my concerns about bytes/file input handling.
Two blockers on this at the moment:
- Storage configuration, which I've tried to give more explicit instructions about.
- It looks like all the requirements files have been changed to have absolute paths from your local environment.
Would you like to schedule some time on Friday to go through any of this together?
Thank you.
submissions/models.py
Outdated
| def __new__(cls, *args, **kwargs): | ||
| if 'pytest' in sys.modules or any('test' in arg for arg in sys.argv): | ||
| return FileSystemStorage(*args, **kwargs) | ||
| return super().__new__(cls) # pragma: no cover | ||
|
|
||
| bucket_name = getattr(settings, "FILE_UPLOAD_STORAGE_BUCKET_NAME", "openedx") |
There was a problem hiding this comment.
However, @angonz and I have a question arising from your inquiry: the security for delivering the image depends on the bucket being used. Currently, they are stored in the default OpenEdX bucket, which means it adapts to its existing policies. But we want to know if it's appropriate to use this default bucket or if it would be more convenient to use another pre-existing bucket like the ORA-2 bucket (openedx-learning). Could you share your opinion on this point?
Sorry for not replying to this in my last review pass. Let's do this: have it look for a specific named storages defined like how openedx-learning does it, but named EDX_SUBMISSIONS['MEDIA']. If that configuration value is not found (either EDX_SUBMISSIONS doesn't exist at all or EDX_SUBMISSIONS['MEDIA'] doesn't), have it fall back to using the default Django storages. Please remember that you can set the storage parameter of a FileField to be any callable that returns a Storage class, so you can switch between storage configurations that way.
So please do the following:
- Make a new function
get_storage()and use thefunctools.cachedecorator on it to cache it for future use. - Have
get_storage()search for EDX_SUBMISSIONS configuration for the storage, but have it also gracefully fall back to the default storage backend if none is configured. - Make sure that
get_storage()loads the storage configuration dynamically like this, and don't hardcode it to any specific storage type at this layer. - When creating the SubmissionFile.file FileField, set
storage=get_storage.
As a general rule, try not to do smart switching behavior at this layer. Aside from making it hard to configure and override at the settings layer for testing, it's also just really confusing to have the __new__ on a class return an instance of an entirely different class.
There was a problem hiding this comment.
I've already applied the changes. It's made things much easier. Thanks for the help.
requirements/test.txt
Outdated
| # -r /Users/leonardoberoes/Documents/Aulasneo/Projects/edx-submissions/requirements/base.txt | ||
| # -r /Users/leonardoberoes/Documents/Aulasneo/Projects/edx-submissions/requirements/docs.txt |
There was a problem hiding this comment.
These requirements shouldn't have your environment specific paths in them.
There was a problem hiding this comment.
Sorry, Dave. This was generated by compile requirements. It's also no longer necessary to add dependencies with the new approach.
requirements/common_constraints.txt
Outdated
| # This is a temporary solution to override the real common_constraints.txt | ||
| # In edx-lint, until the pyjwt constraint in edx-lint has been removed. | ||
| # See BOM-2721 for more details. | ||
| # Below is the copied and edited version of common_constraints | ||
|
|
||
| # This is a temporary solution to override the real common_constraints.txt | ||
| # In edx-lint, until the pyjwt constraint in edx-lint has been removed. | ||
| # See BOM-2721 for more details. | ||
| # Below is the copied and edited version of common_constraints | ||
|
|
There was a problem hiding this comment.
It looks like these should just be deleted since they're copies of the first paragraph?
|
Hi Dave, I've sent an invitation for Monday as Friday Leonardo will be out of the office. |
3ad53d6 to
db67756
Compare
- Create Submission file model - Add custom storage to use ORA bucket - Develop SubmissionFileProcessor for create files in DB - Add TestSubmissionFileProcessor with extensive test coverage - Verify complete file processing workflow - Validation tests for file addition in create_external_grader_detail - Document architectural decisions with ADR - Add test queue folder in .gitignore
db67756 to
355da03
Compare
There was a problem hiding this comment.
In addition to the code comments below, please also do the following:
- Update the version to 3.11.0
- When you push your changes, please don't squash your commits. We can squash at the end once everything is approved, but it makes it easier/faster to review I can just "view changes since last review", and that functionality breaks if you squash.
Thank you.
submissions/errors.py
Outdated
| class FileProcessingError(SubmissionError): | ||
| """ | ||
| Exception raised when there's an error reading or processing a file. | ||
|
|
||
| This exception is raised when file operations fail, such as: | ||
| - I/O errors when reading file content | ||
| - OS errors during file operations | ||
| - Unicode decoding errors when processing file content | ||
| """ |
There was a problem hiding this comment.
Can we remove this now that it's no longer being used?
There was a problem hiding this comment.
Ok Dave, understood
|
|
||
| # Then get_storage() will return the S3Boto3Storage instance | ||
| """ | ||
| return _get_storage_cached() # For performance while keeping this function serializable for migrations |
There was a problem hiding this comment.
Very helpful/informative comment, thank you.
submissions/models.py
Outdated
| storage_config = edx_submissions_config.get('MEDIA') | ||
|
|
||
| if storage_config: | ||
| return storage_config |
There was a problem hiding this comment.
Wait, it really works to just pass back the dict directly? I thought we'd have to instantiate the storage class here, like this example. So something like:
# Somewhere much higher:
from django.utils.module_loading import import_string
if storage_config:
storage_cls = import_string(storage_config['BACKEND'])
options = storage_config.get('OPTIONS', {})
return storage_cls(**options)There was a problem hiding this comment.
Hi Dave, I had the same question while reading your previous instructions and preferred to keep things simple. But I completely understand what you're saying, and with that piece of code you've made my job much easier. I had previously tried instantiating the storage class, but the result wasn't as clean as I would have liked, so I went with the solution you reviewed a few days ago.
submissions/models.py
Outdated
| # In settings | ||
| from storages.backends.s3boto3 import S3Boto3Storage | ||
| EDX_SUBMISSIONS = { | ||
| 'MEDIA': S3Boto3Storage(bucket_name='my-bucket') |
There was a problem hiding this comment.
Oh, I see where my misunderstanding of the code in _get_storage_cached() comes from. We shouldn't actually instantiate or even import the storage class in the settings files, only specify the EDX_SUBMISSIONS['MEDIA'] configuration dict in the standard convention that Django uses for media config, which specifies the path of the class we'll need to load later as a string.
The reason has to do with when these things get loaded in the startups cycle. Settings are often overridden in multiple places (e.g. common.py, prod.py, tutor.py, etc). But there's a certain amount of caching/bootstrapping that happens when these storage backends get imported and initialized. For example, there was this issue:
The value for AWS_S3_ENDPOINT_URL set in openedx-development-settings specifies port 9000, and it was being written to Studio and LMS settings correctly. But this is the sequence of what was really happening:
- openedx-common-settings first sets AWS_S3_ENDPOINT_URL WITHOUT port
info, since this is how it's used in prod deployments.- openedx-common-settings imports S3Boto3Storage because it wants to
make a subclass to use for USER_TASKS_ARTIFACT_STORAGE.- The act of importing that package has the side-effect of initializing
the Django storages (django.core.files.storage.default_storage).- Because the DEFAULT_FILE_STORAGE class is S3Boto3Storage, it then
reads the value of AWS_S3_ENDPOINT_URL (which has no port info), and
sets this as the S3 conneciton information for the S3Boto3Storage
instance at that time.- openedx-development-settings then comes along and changes the value
of AWS_S3_ENDPOINT_URL to specify port 9000, but by this point it's
too late to affect django-storages (though it will confuse folks who
try to debug).
Leaving it in string form avoids this problem. The settings can get overridden as much as people need, and the storage class won't be instantiated until after it's all been set up properly. Please assume the configuration will be something like:
EDX_SUBMISSIONS = {
'MEDIA': {
'BACKEND': 'django.core.files.storage.InMemoryStorage',
'OPTIONS': {
}
}
}Also, please try specifying such a backend for the test settings, to make sure it works properly.
There was a problem hiding this comment.
Understood Dave, I'll try to have the changes made before the meeting.
The addition of submission file functionality and storage enhancement warrants a minor version bump following semver conventions.
362f437 to
d99a8f2
Compare
- Replace SubmissionFileProcessor with direct file handling in create_external_grader_detail - Implement flexible storage configuration via EDX_SUBMISSIONS['MEDIA'] dictionary - Support dynamic backend loading with proper Django settings conventions - Add cached storage implementation with serialization-safe public interface - Configure InMemoryStorage for tests to ensure reliable test execution - Update tests to accommodate new storage patterns and configuration - Ensure backward compatibility with existing file operations This implementation follows Django best practices for storage configuration while maintaining performance through caching and avoiding serialization issues with migrations.
d99a8f2 to
275b6e7
Compare
| # SubmissionError imported so that code importing this api has access | ||
| from submissions.errors import ( # pylint: disable=unused-import | ||
| ExternalGraderQueueEmptyError, | ||
| SubmissionError, |
There was a problem hiding this comment.
This removal appears to be causing a failure in edx-platform when upgrading to edx-submissions 3.11:
There was a problem hiding this comment.
@timmc-edx: Thank you! @leoaulasneo98: Can you please fix this and do a patch version bump?
There was a problem hiding this comment.
Hi Dave, I'll take care of it. Sorry for the mistake. I'll add the import again and do the corresponding PR.
There was a problem hiding this comment.
I went ahead and created a PR for it: #303
(I'd like to get this merged soon so that edx-platform requirements updates are unblocked -- otherwise, it would be best to temporarily pin the version to <3.11.0 in edx-platform.)

FC-73 Feature: Submission File System - Enhanced File Management for Graded Submissions
Description
This pull request implements a comprehensive file management system for submissions through the introduction of the
SubmissionFilemodel and its associatedSubmissionFileManager. This enhancement supports proper storage, retrieval, and processing of files attached to submissions within the Open edX ecosystem.Motivation
The existing submission system requires enhanced file handling capabilities to:
Previously, the XQueue server managed files for submissions that required document attachments. However, this was implemented inefficiently using plain text fields (s3_keys and s3_urls) without proper structure or validation. This approach led to inconsistent file handling, potential security issues, and maintenance challenges as the system scaled.
Key Improvements
Model Enhancements
SubmissionFilemodel with:File Processing System
SubmissionFileManagerwith capabilities for:Error Handling
Technical Details
File Processing
Integration Points
ExternalGraderDetailfor complete submission workflowTesting Strategy
Extensive test coverage includes:
Open edX Compliance
This implementation adheres to Open edX standards through:
Performance Considerations
BREAKING CHANGES: None. Designed for full compatibility with existing systems.