Skip to content

FC-73 Add XqueueViewset with xwatcher services (Feature: XQueueViewSet - Compatible API Endpoints for External Grader Integration)#287

Closed
leoaulasneo98 wants to merge 6 commits intoopenedx:masterfrom
aulasneo:XQU-26-merge-files-submission-with-get-submission
Closed

FC-73 Add XqueueViewset with xwatcher services (Feature: XQueueViewSet - Compatible API Endpoints for External Grader Integration)#287
leoaulasneo98 wants to merge 6 commits intoopenedx:masterfrom
aulasneo:XQU-26-merge-files-submission-with-get-submission

Conversation

@leoaulasneo98
Copy link
Contributor

@leoaulasneo98 leoaulasneo98 commented Feb 24, 2025

FC-73 Feature: XQueueViewSet - Compatible API Endpoints for External Grader Integration

⚠️ Important: This PR builds on the ExternalGraderDetail and SubmissionFile infrastructure deployed in previous PRs (FC-73). Please ensure those changes are fully deployed before merging.

Description

This pull request implements the XQueueViewSet, providing compatible API endpoints that allow external graders (xqueue-watcher) to interact with the new submission and grading architecture. This viewset consolidates authentication, submission retrieval, and result processing services while leveraging the new ExternalGraderDetail and SubmissionFile models.

Motivation

The current XQueue implementation requires a strategic update to:

  • Provide a secure and efficient API for external grader integration
  • Simplify the complex multi-system communication patterns
  • Enhance session management and security
  • Integrate cleanly with the new submission architecture

Key Improvements

Authentication Services

  • Custom XQueueSessionAuthentication class with:
    • CSRF exemptions for result endpoints
    • Secure login/logout endpoints

Submission Distribution

  • Queue-based submission retrieval through get_submission endpoint
  • Integrated file handling through SubmissionFileManager
  • Status tracking with explicit state transitions
  • UUID-based submission keys for enhanced security

Result Processing

  • Transactional score updates via the put_result endpoint
  • Comprehensive error handling with detailed logging
  • Atomic status updates for reliable state management
  • Automatic retry mechanism for failed submissions

Technical Details

Score Processing

  • Integration with the submissions API via set_score
  • Robust failure tracking and processing
  • Status transition enforcement

Error Handling

  • Comprehensive validation of grader responses
  • Detailed logging for troubleshooting
  • Clear error responses for client integration

Testing Strategy

Comprehensive test coverage includes:

  • Authentication workflow verification
  • Submission retrieval process
  • Result processing and error handling
  • Session management edge cases
  • Integration with existing submission models

Open edX Compliance

This implementation adheres to Open edX standards through:

  • RESTful API design following DRF best practices
  • Comprehensive security measures
  • Transaction safety for critical operations
  • Extensive logging for operational visibility
  • Compatibility with existing systems

Performance Considerations

  • Optimized database queries with select_related
  • Efficient transaction handling
  • Minimal processing overhead

BREAKING CHANGES: None. Designed for full backward compatibility with existing xqueue-watcher services.

Documentation

Updated documentation will include:

  • XQueue API specification for external graders
  • Session management guidelines
  • Error handling and response format specifications
  • Integration guide for external grader services

Implementation References

  • Related ADR: Implementation of XQueue Compatible Views for External Grader Integration
  • Previous PRs: FC-73 SubmissionQueueRecord and SubmissionFile implementations

@openedx-webhooks openedx-webhooks added the open-source-contribution PR author is not from Axim or 2U label Feb 24, 2025
@openedx-webhooks
Copy link

openedx-webhooks commented Feb 24, 2025

Thanks for the pull request, @leoaulasneo98!

This repository is currently maintained by @openedx/committers-edx-submissions.

Once you've gone through the following steps feel free to tag them in a comment and let them know that your changes are ready for engineering review.

🔘 Get product approval

If you haven't already, check this list to see if your contribution needs to go through the product review process.

  • If it does, you'll need to submit a product proposal for your contribution, and have it reviewed by the Product Working Group.
    • This process (including the steps you'll need to take) is documented here.
  • If it doesn't, simply proceed with the next step.
🔘 Provide context

To help your reviewers and other members of the community understand the purpose and larger context of your changes, feel free to add as much of the following information to the PR description as you can:

  • Dependencies

    This PR must be merged before / after / at the same time as ...

  • Blockers

    This PR is waiting for OEP-1234 to be accepted.

  • Timeline information

    This PR must be merged by XX date because ...

  • Partner information

    This is for a course on edx.org.

  • Supporting documentation
  • Relevant Open edX discussion forum threads
🔘 Get a green build

If one or more checks are failing, continue working on your changes until this is no longer the case and your build turns green.

Details
Where can I find more information?

If you'd like to get more details on all aspects of the review process for open source pull requests (OSPRs), check out the following resources:

When can I expect my changes to be merged?

Our goal is to get community contributions seen and reviewed as efficiently as possible.

However, the amount of time that it takes to review and merge a PR can vary significantly based on factors such as:

  • The size and impact of the changes that it introduces
  • The need for product review
  • Maintenance status of the parent repository

💡 As a result it may take up to several weeks or months to complete a review and merge your PR.

@codecov
Copy link

codecov bot commented Feb 24, 2025

Codecov Report

❌ Patch coverage is 99.27954% with 5 lines in your changes missing coverage. Please review.
✅ Project coverage is 95.89%. Comparing base (9cd2fc2) to head (0e70be6).
⚠️ Report is 18 commits behind head on master.

Files with missing lines Patch % Lines
submissions/tests/test_viewsets.py 98.54% 5 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #287      +/-   ##
==========================================
+ Coverage   94.79%   95.89%   +1.09%     
==========================================
  Files          18       23       +5     
  Lines        2423     3022     +599     
  Branches       99      121      +22     
==========================================
+ Hits         2297     2898     +601     
+ Misses        115      113       -2     
  Partials       11       11              
Flag Coverage Δ
unittests 95.89% <99.27%> (+1.09%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@leoaulasneo98

This comment was marked as resolved.

@mphilbrick211 mphilbrick211 added the FC Relates to an Axim Funded Contribution project label Feb 24, 2025
@angonz
Copy link

angonz commented Mar 6, 2025

Review can start only after #283 and #286 are merged

Copy link
Contributor Author

@leoaulasneo98 leoaulasneo98 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @ormsbee , good afternoon, greetings. We have a question about this here, we already have the put result service in edx submission (which also exists in Xqueue server). We are missing implementing the call back but this has a drawback, in edx submission we cannot import xqueue_callback or any open edx package because there start to be import errors when running the tests, in addition to the fact that a circular import is created since edx submission is a dependency of edx platform.

It is for this reason that we decided to use an open edx event signal so that the callback is executed once the score of the submission has been stored. However, we have some questions:

1 Is this the best solution at the architectural level

2 Are there possible errors or incompatibilities that we are overlooking

3 Should we make a new callback or recycle the xqueue callback

4 Is there a more efficient solution?

@leoaulasneo98 leoaulasneo98 force-pushed the XQU-26-merge-files-submission-with-get-submission branch 3 times, most recently from 1f3b38b to 77398d7 Compare March 17, 2025 18:15
@leoaulasneo98 leoaulasneo98 force-pushed the XQU-26-merge-files-submission-with-get-submission branch 2 times, most recently from 07f33a8 to 4decb2f Compare March 25, 2025 17:28
@ormsbee
Copy link
Contributor

ormsbee commented Mar 26, 2025

Hello @ormsbee , good afternoon, greetings. We have a question about this here, we already have the put result service in edx submission (which also exists in Xqueue server). We are missing implementing the call back but this has a drawback, in edx submission we cannot import xqueue_callback or any open edx package because there start to be import errors when running the tests, in addition to the fact that a circular import is created since edx submission is a dependency of edx platform.

It is for this reason that we decided to use an open edx event signal so that the callback is executed once the score of the submission has been stored. However, we have some questions:

1 Is this the best solution at the architectural level

I think it's totally reasonable, and one of the best uses of signals.

2 Are there possible errors or incompatibilities that we are overlooking

Nothing comes to mind. We have to be careful about any change to the data being sent in the signal, but I can't think of any other issues.

3 Should we make a new callback or recycle the xqueue callback

I think the first thing I would try is to make a new function that listens to the signal and translates the data into whatever the existing xqueue callback function expects. But please feel free to refactor differently if you find it's easier to do otherwise.

4 Is there a more efficient solution?

I wouldn't worry too much about efficiency with this. As long as the callback is being invoked in-process, I don't think it'll make any difference to use a signal vs. other mechanisms.

@leoaulasneo98 leoaulasneo98 force-pushed the XQU-26-merge-files-submission-with-get-submission branch from 4decb2f to 8cb02ae Compare March 26, 2025 14:10
@leoaulasneo98 leoaulasneo98 force-pushed the XQU-26-merge-files-submission-with-get-submission branch 8 times, most recently from 3ceeda0 to ad18da2 Compare May 22, 2025 19:46
leoaulasneo98 and others added 4 commits May 27, 2025 14:10
  - Add XqueueViewSet with complete xqueue-watcher service compatibility
  - Implement get_submission service for retrieving pending submissions
  - Add put_result service with row-level locking (select_for_update(nowait=True)) to prevent race conditions
  - Ensure concurrent xqueue-watcher instances process each submission exactly once, even under high load
  - Implement standardized response format for backward compatibility
  - Add session management and authentication handling for XWatcher clients
  - Add comprehensive test coverage for core interactions
…nd add timeout mechanism (Get Submission)

- Implemented locking to ensure submissions are processed by a single xqueue watcher.
- Added timeout mechanism for submissions stuck in 'pulled' state.
- Updated tests to cover new error scenarios and timeout handling.
- Renamed variables from 'submission_record' to 'external_grader' throughout the
  codebase for better consistency with model naming
- Added new 'retry' status to integrate with submission processing retry services
- Removed unused 'is_processable' method that wasn't providing any value
- Enhanced test coverage
- Add new status external grader detail migration
* Remove RETRY status and complex transition validation from ExternalGraderDetail
* Streamline update_status() method to accept score_msg parameter
* Simplify put_result() error handling to use specific exceptions
* Remove retry logic in favor of direct failed status transitions
* Consolidate grader_reply updates within status transitions

Breaking changes:
- Removed ExternalGraderDetail.Status.RETRY enum value
- Removed VALID_TRANSITIONS and can_transition_to() validation
- Changed update_status() method signature to include score_msg parameter
- Simplified error handling removes automatic retry mechanism
@leoaulasneo98 leoaulasneo98 force-pushed the XQU-26-merge-files-submission-with-get-submission branch from ad18da2 to ef493c7 Compare May 27, 2025 18:10
- Add comprehensive documentation for admin interface features
- Document status management, search capabilities, and security model
- Remove retry status from queue filtering logic
- Fix pylint unused import warning in api module
@leoaulasneo98 leoaulasneo98 force-pushed the XQU-26-merge-files-submission-with-get-submission branch from 7ea84ce to a9f06a7 Compare May 27, 2025 18:47
@leoaulasneo98 leoaulasneo98 marked this pull request as ready for review May 27, 2025 19:51
@ormsbee
Copy link
Contributor

ormsbee commented May 29, 2025

@leoaulasneo98: I've started doing a more detailed view, but a couple of very high-level things:

  1. It looks like this code will allow any LMS-authenticated user (e.g. any student from any course) to grab submissions and put results to the XQueue endpoint. Is that correct? If so, it would be a major security issue that we need to address.
  2. Do you have time to meet either tomorrow (Thu) or Friday to go over this code together, line by line?

Copy link
Contributor

@ormsbee ormsbee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a complete review, just one that primarily covers the ViewSet. Please feel free to read through this before our meeting today, but let's make the code changes together today in our meeting. Thank you.

log = logging.getLogger(__name__)


class XqueueViewSet(viewsets.ViewSet):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
class XqueueViewSet(viewsets.ViewSet):
class XQueueViewSet(viewsets.ViewSet):

Just for naming consistency.

"""
Endpoint for authenticating users and creating sessions.
"""
log.info(f"Login attempt with data: {request.data}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't write passwords into the log. You can remove this logging entry entirely and replace it with logging for the different errors and success paths.

Comment on lines 67 to 69
return Response(
{'return_code': 1, 'content': 'Insufficient login info'},
status=status.HTTP_400_BAD_REQUEST
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a comment here to note that this is a change from the existing behavior that basically always returns a 200, regardless of whether there's an error or not (and relies exclusively on the JSON response to convey that). You don't have to change the behavior of this code–I think you're doing the right thing here. But any change in behavior should be clearly called out in case it results in regressions for some reason.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, given how many places the return codes are changing, it's probably better to add a comment in the docstring for the viewset addressing this.

Comment on lines 76 to 90
)

if user is not None:
login(request, user)
response = Response(
{'return_code': 0, 'content': 'Logged in'},
status=status.HTTP_200_OK
)

return response

return Response(
{'return_code': 1, 'content': 'Incorrect login credentials'},
status=status.HTTP_401_UNAUTHORIZED
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're doing early returns on error conditions when you do this:

        if 'username' not in request.data or 'password' not in request.data:
            return Response(
                {'return_code': 1, 'content': 'Insufficient login info'},
                status=status.HTTP_400_BAD_REQUEST
            )

This is good. It prevents the code from getting deeply nested, and keeps people from having to skip around to see where conditionals end. But if you're doing this, please be consistent and make it so that all the early returns are error states, and the last thing is the successful path.


return Response(
{'return_code': 1, 'content': 'Incorrect login credentials'},
status=status.HTTP_401_UNAUTHORIZED
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, please add a comment here that we're changing the behavior that XQueue had by sending the appropriate HTTP code.

except ExternalGraderDetail.DoesNotExist:
log.error(
"Grader submission_id refers to nonexistent entry in Submission DB: "
"grader: %s, submission_key: %s, score_msg: %s",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This log entry says "grader" but the field being written is "submission_id". Also, please keep the old terminology of "grader_reply" rather than "score_msg", since it contains more than just the score, and it's not just a text message.


if not external_grader.pullkey or submission_key != external_grader.pullkey:
log.error(f"Invalid pullkey: submission key from xwatcher {submission_key} "
f"and submission key stored {external_grader.pullkey} are different")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please log the key they tried to send for this entry, since it might aid debugging, e.g. if they're only different by case, or if they're different by whitespace, etc.

return fail

try:
header_dict = json.loads(header)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the code in this PR is unnecessarily encoding and decoding JSON internally, and I'm not clear on why it's doing so. We should keep things in native data structures as much as possible internally, and only keep things as JSON encoded if we're doing nothing with it but passing it through (i.e. we're not inspecting it), or if backwards compatibility requires that we do it that way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where we're dealing with JSON encoded messages coming in, we should parse it to native data structures as soon as it's practical to do so.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right that this is counter-intuitive, but we're constrained by legacy compatibility.
The legacy XQueue server returns exactly this nested JSON string format:

{
"xqueue_body": "{"grader_payload": "{...}", "student_info": "{...}", "student_response": "..."}"
}

xqueue-watcher clients expect and parse this format:

json.loads(response['content']) → gets payload
json.loads(payload['xqueue_body']) → gets submission data
json.loads(submission_data['grader_payload']) → gets grader config

We have to replicate this exact behavior - the double/triple encoding is required for API compatibility with existing xqueue-watcher deployments.
The encoding/decoding dance exists because that's how the legacy system works and what clients expect.

This is xqueue server response
image

if tag not in header_dict:
return fail

submission_id = int(header_dict['submission_id'])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would this ever not already be an int?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're technically correct that xqueue-watcher sends it as an int.
However, this int() conversion exists in the legacy code for defensive programming - JSON parsing can sometimes deserialize numbers as strings depending on the client implementation or how the data was originally encoded.
Legacy XQueue does the same conversion:

image

We're maintaining this pattern for:

Legacy compatibility - same behavior as original
Robustness - handles edge cases where JSON might come as string "22" instead of int 22

The conversion is essentially defensive programming that the legacy system used, so we're keeping it consistent.

score = json.loads(score_msg)
points_earned = score.get("score")
except (TypeError, ValueError):
return fail
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding error logging to this validation would be nice, since the only thing people running this in production will currently see is that the replies are bad, but not why they're bad.

@leoaulasneo98
Copy link
Contributor Author

@ormsbee
General comment:
We tested this flow with two types of exercises: Python scripts and file submissions.
For the xqueue-watcher, we implemented a simple grader to validate Python code, while file submissions depend on the specific grader implementation - there's no standard defined in xqueue-watcher for file handling.
I invite you to check out our demo video on YouTube (private, accessible only with the link). The code shown is somewhat outdated, but the functionality remains the same - the changes we've made improve efficiency and versatility but don't change the workflow.
The video provides reliable proof that what we implemented in this view works end-to-end with xqueue-watcher, especially addressing your concerns about the response format compatibility.
The video demonstrates that our API maintains full compatibility with existing xqueue-watcher deployments.

https://www.youtube.com/watch?v=Ac6wa_aiLXw

image image image image

@leoaulasneo98
Copy link
Contributor Author

@ormsbee The login testing features are implemented as you told me in the last meeting

- Change XqueueViewSet to XQueueViewSet for naming consistency
- Update score_msg parameter to grader_reply throughout codebase

test: add comprehensive authorization and validation test coverage

- Add unauthenticated access tests
- Add tests for users without xqueue group membership
- Add validation edge cases for grader_reply parsing
- Achieve 100% test coverage on XQueueViewSet validation paths

feat: add IsXQueueUser permission class for group-based authorization

- Implement xqueue group membership validation for endpoint access control

fix: remove sensitive data logging from authentication

- Remove password logging and improve error message format per security review
@leoaulasneo98 leoaulasneo98 force-pushed the XQU-26-merge-files-submission-with-get-submission branch from 3305a7c to 0e70be6 Compare May 31, 2025 00:54
@UsamaSadiq
Copy link
Member

This change has been merged in the follow up PR.

@UsamaSadiq UsamaSadiq closed this Oct 22, 2025
@github-project-automation github-project-automation bot moved this from Waiting on Author to Done in Contributions Oct 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

FC Relates to an Axim Funded Contribution project open-source-contribution PR author is not from Axim or 2U

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

7 participants