Skip to content

feat: time-based logging for userid mismatches#28917

Closed
rgraber wants to merge 3 commits intomasterfrom
rsgraber/ARCHBOM-1876-log-safe-session-mismatches-better
Closed

feat: time-based logging for userid mismatches#28917
rgraber wants to merge 3 commits intomasterfrom
rsgraber/ARCHBOM-1876-log-safe-session-mismatches-better

Conversation

@rgraber
Copy link
Contributor

@rgraber rgraber commented Oct 4, 2021

Description

Modify LOG_REQUEST_USER_CHANGES toggle with a memcache key that is set to True after the SafeSession middleware finds a userid mismatch, expiring after 300s.

Supporting information

https://openedx.atlassian.net/browse/ARCHBOM-1876

Deadline

None

@rgraber rgraber changed the title <!-- ## #### Note: the Lilac master branch has been created. Please consider whether your change #### should also be applied to Lilac. If so, make another pull request against the #### open-release/lilac.master branch, or ping @nedbat for help or questions. ## feat: time-based logging for userid mismatches Oct 4, 2021
@rgraber rgraber force-pushed the rsgraber/ARCHBOM-1876-log-safe-session-mismatches-better branch from fb477d9 to ae22654 Compare October 6, 2021 19:21
@rgraber rgraber marked this pull request as ready for review October 6, 2021 19:37
@rgraber
Copy link
Contributor Author

rgraber commented Oct 7, 2021

jenkins run python

@rgraber rgraber force-pushed the rsgraber/ARCHBOM-1876-log-safe-session-mismatches-better branch from b01bfcb to 8f0a5c0 Compare October 7, 2021 18:01
@rgraber
Copy link
Contributor Author

rgraber commented Oct 7, 2021

jenkins run python

Copy link
Contributor

@robrap robrap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @rgraber. I'd like to discuss some thoughts I had which reviewing.

request.safe_cookie_verified_user_id = user_id # Step 5
if LOG_REQUEST_USER_CHANGES:
request.safe_cookie_session_id = safe_cookie_data.session_id
if cache.get(USER_MISMATCH_CACHE_KEY, False) and LOG_REQUEST_USER_CHANGES:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going to add a comment to explain how this was working, and now I have questions about the design.
Why do we log here vs save off information to log later? Are we afraid of losing data during exceptions? Other?
Sorry - this might lead to a larger design discussion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: maybe this isn't important, but the reason I bring it up because I was noting to myself that the current design always misses the first blip. So, if we have rare occurrences of a single issue that are spaced more than 5 minutes apart, we'll never get the logs, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. We need to decide if we will be using New Relic to detect issues only, or adding a Splunk alert. If we are happy with New Relic only, then we can probably solve for the missing blip problem if it actually occurs (i.e. we get a New Relic alert, but find no logs in Splunk).
  2. Here is a potential comment for this line:
# Log useful debugging info from the request after having detected any recent mismatches
#   during any response. This will result in a missing log for the very first mismatch, but
#   hopefully the error will come in batches with logs to debug. If not, we'll need to enhance.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My impression was that we would be relying on New Relic and using Splunk for debugging, as you mentioned. I'm happy to add a comment to that effect.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd appreciate a comment like the one I had crafted, but not sure if others feel the same.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found the shorter one I used easier to grok but I'm also willing to be overruled

@rgraber rgraber force-pushed the rsgraber/ARCHBOM-1876-log-safe-session-mismatches-better branch from e3c22bb to 626d1ec Compare October 8, 2021 18:31
@rgraber
Copy link
Contributor Author

rgraber commented Oct 13, 2021

jenkins run all

1 similar comment
@rgraber
Copy link
Contributor Author

rgraber commented Oct 13, 2021

jenkins run all

@rgraber rgraber force-pushed the rsgraber/ARCHBOM-1876-log-safe-session-mismatches-better branch from 8f08ac0 to 7823210 Compare October 13, 2021 14:34
@rgraber rgraber force-pushed the rsgraber/ARCHBOM-1876-log-safe-session-mismatches-better branch from 7823210 to 8a79343 Compare October 13, 2021 16:12
yield

@contextmanager
def assert_logged_for_request_user_mismatch(self, user_at_request, user_at_response, log_level, request_path):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ended up using this in my PR -- which I had forgotten to merge until just a few minutes ago.

Comment on lines 437 to 438
if not cache.touch(USER_MISMATCH_CACHE_KEY, 300):
cache.set(USER_MISMATCH_CACHE_KEY, True, 300)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the benefit of this touch/set approach?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It felt conceptually closer to "extend the timeout period for every incident" but I'm not married to it

Comment on lines +420 to +421
# The user at response time (and in the session) is expected to be None when the user
# is logging out so do not treat it as a mismatch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can do away with (or improve) this aspect of the logic by checking against the new response field I'm setting in #28983.

Comment on lines 449 to 450
"SafeCookieData user at request '{}' does not match user in response: '{}' "
"for request path '{}'. {}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: It's not really in the response, it's request.user.id at response time -- the original message was a bit telegraphic here.

Comment on lines +468 to +469
else:
# both session user and user in response are different than user in request
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this will actually end up running for both the "both" and the "neither" case, but I'm having some trouble with the negated logic in the first two guards.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The neither case should be taken care of by the top-level if, but I see your point about confusing logic

Copy link
Contributor

@robrap robrap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for lots of comments. I should probably re-read first, but I am sending these early to head to lunch. Happy to discuss in person later.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think RECENT_USER_MISMATCH_CACHE_KEY adds clarity to the expiration of what is being cached. I noted this in https://github.com/edx/edx-platform/pull/28917#discussion_r724420744 as a non-blocking issue. Can you either update, or confirm that you saw this and want to go with the original name. Thanks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not married to it. Will update.

request.safe_cookie_verified_user_id = user_id # Step 5
if LOG_REQUEST_USER_CHANGES:
request.safe_cookie_session_id = safe_cookie_data.session_id
if cache.get(USER_MISMATCH_CACHE_KEY, False) and LOG_REQUEST_USER_CHANGES:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd appreciate a comment like the one I had crafted, but not sure if others feel the same.

Comment on lines 320 to 332
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I've been asked to indent multi-line paragraphs and got used to it. I do think it reads more easily. This is a non-blocking comment for your consideration. Note that it would affect other comments, such as one where you removed the indent from the second line of a comment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: Here is an example where you removed an indent. See earlier comment.

def test_success(self, mock_log_request_user_changes, toggle, cache_flag, should_log):
with patch("openedx.core.djangoapps.safe_sessions.middleware.LOG_REQUEST_USER_CHANGES", toggle):
self.client.login(username=self.user.username, password='test')
cache.set(USER_MISMATCH_CACHE_KEY, cache_flag)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels like we should assert USER_MISMATCH_CACHE_KEY at the end as well. At this time, does a normal login trigger a mismatch? Would the cache always be True? Maybe this will change after Tim's work, but would capture how it should be working now.

object.
set_session_cookie - If True, a session_id is set in the
session cookie in the response.
session cookie in the response.c
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo added.

session cookie in the response.
session cookie in the response.c
"""
print('assert response')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left from debugging?

def test_different_user_at_step_2_error(self, mock_cache_set, change_response_user,
change_session_user, mismatch_location, change_session):
new_user = UserFactory()
self.request.safe_cookie_verified_user_id = self.user.id
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if there is a way to mock slightly less in this test? Could we call the actual middleware on the request and have it do this instead, and then muck with stuff, and then call the middleware on the response?

def test_success(self):
@patch("django.core.cache.cache.set")
@patch("django.core.cache.cache.touch")
def test_success(self, mock_cache_set, mock_cache_touch):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious if we could test the consequences of this, rather than the cache value itself? For example, using multiple calls with or without a mismatch and showing that logging happens where and when we expect. Food for thought.

@edx-status-bot
Copy link

Your PR has finished running tests. The following contexts failed:

  • jenkins/django-3.2/quality

@rgraber rgraber closed this Oct 13, 2021
@rgraber
Copy link
Contributor Author

rgraber commented Oct 13, 2021

Closing in favor or ARCHBOM-1923

@rgraber rgraber deleted the rsgraber/ARCHBOM-1876-log-safe-session-mismatches-better branch October 13, 2021 18:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants