-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: use non_atomic_requests decorator in handle_block view #34020
fix: use non_atomic_requests decorator in handle_block view #34020
Conversation
Thanks for the pull request, @mariajgrimaldi! Please note that it may take us up to several weeks or months to complete a review and merge your PR. Feel free to add as much of the following information to the ticket as you can:
All technical communication about the code itself will be done via the GitHub pull request interface. As a reminder, our process documentation is here. Please let us know once your PR is ready for our review and all tests are green. |
68ec410
to
4114e36
Compare
Hi there @ormsbee, I'm tagging you here since you authored these changes. Please let me know if we should tag other folks here. Thanks! |
Can this be fixed by removing the implicit view-level transaction on whatever view is triggering the publish? I'm guessing that it's being caused because the async process doesn't see the updated split modulestore index that's stored in the Django ORM active versions table now, because the task launches before that transaction commits. The reason I ask is that "delay be X seconds" inevitably causes operational race conditions at some point (e.g. when a piece of middleware starts hanging because it's contacting some service that has failed). If we can make sure the change is committed and visible before the celery queuing task executes, that would be ideal. |
7a68265
to
5b64322
Compare
Thank you @ormsbee for the suggestion! I tested it out here, and it seems to be working fine when publishing new configurations like hide from TOC and with existent ones like hide from learners or any other change, I also created/deleted/duplicated/edited blocks just to watch the views' behavior. I haven't tested more than that, eg. generating exceptions, so I can't tell what would happen with the DB consistency if the view fails mid-execution. But it seems like all the write operations are made in a block: _delete_item, modify_xblock, duplicate_block, _create_block, _move_item, and the rest seem to be read-only operations. |
The way Modulestore works now is that it will create definition documents and new structure documents that point to them. But nothing actually changes until the pointer to the structure document changes in the As far as the rest of the Modulestore is concerned, if something dies part-way through the process of altering a course, it will leave some unreferenced junk in MongoDB, but it shouldn't corrupt the content. That was an explicit goal of the SplitMongo Modulestore. MongoDB didn't have transactions back then, and a recurring issue with Old Mongo is that errors during the import process would leave to a corrupted, half-published state. Split changed things by creating all the pieces and then updating the pointer at the end so that imports were atomic. |
Thanks @ormsbee; I see why this is safer than my previous approach. Is it okay if I tag you as a reviewer? Let me know, we could also involve someone else. Thank you! |
Sure. Thank you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a request for more comments, so nobody casually disables it later.
49d23df
to
e393c1e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, please just squash and merge.
I think this qualifies as a fix:
since the old behavior was wrong.
b3eb6ad
to
13f7a9b
Compare
Hi there @itsjeyd: although this fixes an error raised while implementing the feature enhancement proposal: Hide sections from course outline, this is not strictly related to the implementation. It's a fix for the platform found as a result of it, so I'll remove the product review label since this is backend-facing only, which will not change how the platform currently behaves. |
13f7a9b
to
415703b
Compare
@mariajgrimaldi 🎉 Your pull request was merged! Please take a moment to answer a two question survey so we can improve your experience in the future. |
2U Release Notice: This PR has been deployed to the edX staging environment in preparation for a release to production. |
2U Release Notice: This PR has been deployed to the edX production environment. |
1 similar comment
2U Release Notice: This PR has been deployed to the edX production environment. |
Sounds good @mariajgrimaldi, thanks for the details 👍 |
Description
This PR disables atomic requests so transactions made during the request commit immediately instead of waiting for the end of
the request transaction. This is necessary so the async tasks launched by the current process can see the changes made
during the request. One example is the async tasks launched when courses are published before the request
ends, which end up reading from an outdated database state.
Our 1st approach was adding a countdown to the task that pushes the new course structure into the learning sequences, but after the thread, you'll find in the PR, we opted for this one.
More context:
We found the out-of-date reads behavior after adding the
hide_from_toc
field to theXblockSerializer
(among other changes) in this PR. We did this to use the current course sections visibility settings to configure the OLX-only fieldHide from TOC
; for more context on why we're doing this, please refer to the PR cover letter.So, after adding this new field, we expected that saving the new visibility setting for the course section would update the course learning sequences (the courses' representation in the LMS). Instead, this was happening:
Screencast.from.19-12-23.12.08.56.mp4
After publishing the new course structure, the
course_published
signal is sent. This triggers a few receivers; among them, thelisten_for_course_publish
receiver that pushes the course outline to learning sequences asynchronously. This part of the implementation is crucial for maintaining up to date the course outline in the LMS, which was exactly what was failing. What makes the update in the course outline isupdate_outline_from_modulestore_task
task, which reads from the modulesture and then update the learning sequences accordingly.Having this in mind, we did some digging to find:
CELERY_ALWAYS_EAGER = True
. This confirmed that there's something wrong when using asynchronous processing.Our hypothesis was that if Mongo was up-to-date after the 1st save with the hide from TOC changes, but the LMS didn't reflect them, then the issue was an out-of-date read. To prove this, we saved the hide from TOC changes the 1st time, and then we tried reading again from the course modulestore after some time had passed to update. For this, we ran
./manage.py LMS simulate_publish --courses <COURSE ID>
, which simulates sending the course publish signal triggering thelisten_for_course_publish
receiver and, therefore,update_outline_from_modulestore_task.
And it worked! The course outline in the LMS had the TOC changes.So what was causing the out-of-date reads was the immediate push of the course outline to the learning sequences so we added the countdown.
Supporting information
PR where we found the issue: #33952
More info on the modulestore use in the LMS:
edx-platform/docs/decisions/0011-limit-modulestore-use-in-lms.rst
Lines 106 to 126 in 83d104f
Testing instructions
Deadline
Need to be merged before: #33952