feat: time-based logging for userid mismatches#28917
Conversation
fb477d9 to
ae22654
Compare
|
jenkins run python |
b01bfcb to
8f0a5c0
Compare
|
jenkins run python |
| request.safe_cookie_verified_user_id = user_id # Step 5 | ||
| if LOG_REQUEST_USER_CHANGES: | ||
| request.safe_cookie_session_id = safe_cookie_data.session_id | ||
| if cache.get(USER_MISMATCH_CACHE_KEY, False) and LOG_REQUEST_USER_CHANGES: |
There was a problem hiding this comment.
I was going to add a comment to explain how this was working, and now I have questions about the design.
Why do we log here vs save off information to log later? Are we afraid of losing data during exceptions? Other?
Sorry - this might lead to a larger design discussion.
There was a problem hiding this comment.
Note: maybe this isn't important, but the reason I bring it up because I was noting to myself that the current design always misses the first blip. So, if we have rare occurrences of a single issue that are spaced more than 5 minutes apart, we'll never get the logs, right?
There was a problem hiding this comment.
- We need to decide if we will be using New Relic to detect issues only, or adding a Splunk alert. If we are happy with New Relic only, then we can probably solve for the missing blip problem if it actually occurs (i.e. we get a New Relic alert, but find no logs in Splunk).
- Here is a potential comment for this line:
# Log useful debugging info from the request after having detected any recent mismatches
# during any response. This will result in a missing log for the very first mismatch, but
# hopefully the error will come in batches with logs to debug. If not, we'll need to enhance.
There was a problem hiding this comment.
My impression was that we would be relying on New Relic and using Splunk for debugging, as you mentioned. I'm happy to add a comment to that effect.
There was a problem hiding this comment.
I'd appreciate a comment like the one I had crafted, but not sure if others feel the same.
There was a problem hiding this comment.
I found the shorter one I used easier to grok but I'm also willing to be overruled
e3c22bb to
626d1ec
Compare
|
jenkins run all |
1 similar comment
|
jenkins run all |
8f08ac0 to
7823210
Compare
7823210 to
8a79343
Compare
| yield | ||
|
|
||
| @contextmanager | ||
| def assert_logged_for_request_user_mismatch(self, user_at_request, user_at_response, log_level, request_path): |
There was a problem hiding this comment.
I ended up using this in my PR -- which I had forgotten to merge until just a few minutes ago.
| if not cache.touch(USER_MISMATCH_CACHE_KEY, 300): | ||
| cache.set(USER_MISMATCH_CACHE_KEY, True, 300) |
There was a problem hiding this comment.
What's the benefit of this touch/set approach?
There was a problem hiding this comment.
It felt conceptually closer to "extend the timeout period for every incident" but I'm not married to it
| # The user at response time (and in the session) is expected to be None when the user | ||
| # is logging out so do not treat it as a mismatch |
There was a problem hiding this comment.
I think we can do away with (or improve) this aspect of the logic by checking against the new response field I'm setting in #28983.
| "SafeCookieData user at request '{}' does not match user in response: '{}' " | ||
| "for request path '{}'. {}" |
There was a problem hiding this comment.
Nit: It's not really in the response, it's request.user.id at response time -- the original message was a bit telegraphic here.
| else: | ||
| # both session user and user in response are different than user in request |
There was a problem hiding this comment.
I believe this will actually end up running for both the "both" and the "neither" case, but I'm having some trouble with the negated logic in the first two guards.
There was a problem hiding this comment.
The neither case should be taken care of by the top-level if, but I see your point about confusing logic
robrap
left a comment
There was a problem hiding this comment.
Apologies for lots of comments. I should probably re-read first, but I am sending these early to head to lunch. Happy to discuss in person later.
There was a problem hiding this comment.
I still think RECENT_USER_MISMATCH_CACHE_KEY adds clarity to the expiration of what is being cached. I noted this in https://github.com/edx/edx-platform/pull/28917#discussion_r724420744 as a non-blocking issue. Can you either update, or confirm that you saw this and want to go with the original name. Thanks.
There was a problem hiding this comment.
Not married to it. Will update.
| request.safe_cookie_verified_user_id = user_id # Step 5 | ||
| if LOG_REQUEST_USER_CHANGES: | ||
| request.safe_cookie_session_id = safe_cookie_data.session_id | ||
| if cache.get(USER_MISMATCH_CACHE_KEY, False) and LOG_REQUEST_USER_CHANGES: |
There was a problem hiding this comment.
I'd appreciate a comment like the one I had crafted, but not sure if others feel the same.
There was a problem hiding this comment.
Nit: I've been asked to indent multi-line paragraphs and got used to it. I do think it reads more easily. This is a non-blocking comment for your consideration. Note that it would affect other comments, such as one where you removed the indent from the second line of a comment.
There was a problem hiding this comment.
Note: Here is an example where you removed an indent. See earlier comment.
| def test_success(self, mock_log_request_user_changes, toggle, cache_flag, should_log): | ||
| with patch("openedx.core.djangoapps.safe_sessions.middleware.LOG_REQUEST_USER_CHANGES", toggle): | ||
| self.client.login(username=self.user.username, password='test') | ||
| cache.set(USER_MISMATCH_CACHE_KEY, cache_flag) |
There was a problem hiding this comment.
It feels like we should assert USER_MISMATCH_CACHE_KEY at the end as well. At this time, does a normal login trigger a mismatch? Would the cache always be True? Maybe this will change after Tim's work, but would capture how it should be working now.
| object. | ||
| set_session_cookie - If True, a session_id is set in the | ||
| session cookie in the response. | ||
| session cookie in the response.c |
| session cookie in the response. | ||
| session cookie in the response.c | ||
| """ | ||
| print('assert response') |
| def test_different_user_at_step_2_error(self, mock_cache_set, change_response_user, | ||
| change_session_user, mismatch_location, change_session): | ||
| new_user = UserFactory() | ||
| self.request.safe_cookie_verified_user_id = self.user.id |
There was a problem hiding this comment.
I wonder if there is a way to mock slightly less in this test? Could we call the actual middleware on the request and have it do this instead, and then muck with stuff, and then call the middleware on the response?
| def test_success(self): | ||
| @patch("django.core.cache.cache.set") | ||
| @patch("django.core.cache.cache.touch") | ||
| def test_success(self, mock_cache_set, mock_cache_touch): |
There was a problem hiding this comment.
I'm curious if we could test the consequences of this, rather than the cache value itself? For example, using multiple calls with or without a mismatch and showing that logging happens where and when we expect. Food for thought.
|
Your PR has finished running tests. The following contexts failed:
|
|
Closing in favor or ARCHBOM-1923 |
Description
Modify LOG_REQUEST_USER_CHANGES toggle with a memcache key that is set to True after the SafeSession middleware finds a userid mismatch, expiring after 300s.
Supporting information
https://openedx.atlassian.net/browse/ARCHBOM-1876
Deadline
None