Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REV-1564: add user-metadata API #25450

Merged
merged 8 commits into from
Nov 4, 2020
Merged

Conversation

dianekaplan
Copy link
Contributor

@dianekaplan dianekaplan commented Oct 27, 2020

Our Optimizely experiments make use of values saved to the user-metadata DOM object, which isn't available in the new courseware MFE. We need an endpoint we can call with a course and user, and get back the same user-metadata values typically available in that DOM object.

Changes in this PR to meet that goal:
Today the user-metadata fields are set in user_metadata.html and dumped into the page json. In this PR, I:

  • move that logic into get_experiment_user_metadata_context() and save that user_metadata into the context
  • update user_metadata.html to now just grab it from the context
  • add an endpoint to call get_experiment_user_metadata_context and return its user-metadata value
  • added primitive tests for the view and the updated function
  • course_dashboard (which has a user but no selected course) makes a bare-bones version of user_metadata with just the user info. It was previously getting this from the code in user_metadata.html. Now that the logic lives in get_experiment_user_metadata_context, I've updated that view to call it and confirmed that the user-metadata is appearing on the various pages as before
  • added a new permission level for this to use, so calls to this API will only return results if the requester logged in is staff or the same user we're getting data for (Optimizely will call on the logged-in user's behalf)

The next step when this is merged will be call this from Optimizely, which may involve authentication/permissions updates

Relevant ticket: https://openedx.atlassian.net/browse/REV-1564

@dianekaplan dianekaplan force-pushed the REV-1564_user_metadata_api branch from a1ff34d to 06ef845 Compare October 29, 2020 10:07
@dianekaplan dianekaplan force-pushed the REV-1564_user_metadata_api branch 4 times, most recently from f705301 to 56b96a0 Compare October 29, 2020 17:40
@dianekaplan dianekaplan changed the title [WIP] REV-1564: add user-metadata API REV-1564: add user-metadata API Oct 29, 2020
@dianekaplan dianekaplan force-pushed the REV-1564_user_metadata_api branch 2 times, most recently from 95cc38a to b3f797d Compare October 30, 2020 11:02
Copy link
Contributor

@MatthewPiatetsky MatthewPiatetsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice improvement to the user metadata
we can test the access together in a few minutes

@dianekaplan dianekaplan force-pushed the REV-1564_user_metadata_api branch 3 times, most recently from 2d6ae5b to e7e6528 Compare October 30, 2020 19:57
context['forum_roles'] = forum_roles
context['partition_groups'] = user_partitions

user_metadata = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to break this function into smaller functions to make it easier to follow?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes- there is a downside though. Background context: the old code had three different code blocks involved in gathering the user-metadata:

  1. get_experiment_user_metadata_context (which we'd originally thought would be the whole story)
  2. get_base_experiment_metadata_context, called by the function above, which then adds authentication/FBE info
  3. then this other logic living in user_metadata.html which gathers together the pieces we want to dump into the DOM

This separation also resulted in a bit of an inconsistency: most views updated their context from #1, but the dashboard view was only using #3.

When I first ported over this code, I did add it as a separate function that the original one could call, to update the context with the user_metadata item. One clunky side-effect was that we had to add several more arguments: instead of being able to use course and user, we also needed to explicitly pass course_id and course_key. (There were apparently some mixins helping to access those fields on user_metadata.html, and they're available in get_experiment_user_metadata_context, but when you break out into a new function they were no longer defined). But worse than that, it felt differently clunky/confusing to have yet another function helping with the same purpose of just gathering/setting the user-metadata. Especially given the inconsistency and confusion noted above, I think it's better to try and have this logic together in one place.

The existing code is visually long, so I could see one day looking to consolidate/separate parts of it, but the purpose of this ticket is to use the existing code to recreate the user-metadata as an endpoint, so I think that sort of cleanup would be beyond the scope of this ticket, and would add change where this is currently working the way we want.

@@ -264,46 +264,118 @@ def get_dashboard_course_info(user, dashboard_enrollments):

def get_experiment_user_metadata_context(course, user):
"""
Return a context dictionary with the keys used by the user_metadata.html.
Return a context dictionary with the keys used for Optimizely experiments, exposed via user_metadata.html
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we be more descriptive with what the expected output should be? For example, listing the keys and a high level description of the expected information for each key?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A lot of these fields are super common in lms, but sometimes seeing sample values is the quickest way to understand the context, especially when some terms get repurposed incorrectly in the actual code (cough course key cough).

Since the comments mention that it's exposed via user_metadata.html, I've added a note with the command to view it, so anyone unfamiliar with user-metadata can see it first-hand. That way they can see the exact values. It may be beneficial down the road to have a descriptive glossary the metadata fields (and probably these lms terms in general), but I think that level of documentation-backfill is beyond the scope of the ticket at hand (to expose this existing data to an endpoint for our internal use).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking something like this so it's a little more in line with more modern commenting styles:

Retrieves user + course data used to enrich Optimizely experiments.

Args:
  course - course_key
  user - username of the currently logged in user

Returns:
  {
   has_non_audit_enrollments: {},
   has_staff_access: boolean, 
   forum_roles: [],
   partition_groups: [],
   user_metadata: {}
  }

With the return values' keys like this, the next person to look at this doesn't need to either trigger the function to figure out the top level keys or read through the entire function to do so. And if there are things that are confusing, it's might be worth documenting in the description so the next person doesn't need to go through the same troubles you did.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I think my summary was confusing: I was actually suggesting that seeing actual example values gives faster/clearer context (to me at least) than saying an element will contain the "course key"; that in my travels with Matthew it sounds like some of these terms themselves have morphed and created confusion, as a developer getting to this part what I really want to know is "is this value the one that looks like the concatenated string? or the one with three parts?"

To be clear, I agree your format is the normal way to go about this (for normal data), but in this particular case I think the reader will actually get more definitive info from specific values than a short description. If there's developer confusion around these user-metadata fields it may be worth backfilling some of the function comments to describe/differentiate the items in this dictionary, but I think that's beyond the scope of this ticket.

try:
user = get_user_by_username_or_email(username)
except User.DoesNotExist:
message = "Provided user is not found"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we be more vague in the error messaging so that we don't give away information about which usernames have accounts and which don't?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After the intended purpose of Optimizely using this new view for the experiment, an alternate use case could be our internal troubleshooting, in which case knowing whether it was the course vs the user that got a 404 is useful. The user who reaches this code is someone logged in with an internal edX user account.

I think the only case where we'd want to suppress this information would be: a bad actor makes a (non-staff) account, finds this endpoint, and tries to look up another user. This user never reaches the code above; has_permissions is false in the IsStaffOrReadOnlyForSelf permission, so they already have a 403 Forbidden response before anything else happens.

call_args = [lookup_user.username, lookup_course.id]
self.client.login(username=lookup_user.username, password=UserFactory._DEFAULT_PASSWORD)

call_args_with_bogus_course = [lookup_user.username, 'course-v1:edX+DemoX+Demo_BOGUS']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎱 minor nit: I generally use some sort of random UUID generator or in this case, probably something along the lines of lookup_course.id + "foobar" to decouple this course.id from whatever the UserFactory() sets for its course.id, to guarantee that the course.id will be different. That way, if for any reason in the future the lookup_course.id changes to course-v1:edX+DemoX+Demo_BOGUS, I don't have to worry about the test failing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea for how to be SURE: I remember a case at an old company where we had a candidate/client actually named John Tester, which had unintended consequences :) Updated!

self.client.login(username=staff_user.username, password=UserFactory._DEFAULT_PASSWORD)

response = self.client.get(reverse('api_experiments:user_metadata', args=call_args))
self.assertEqual(response.status_code, 200)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth asserting that all the top level keys are returned as well, to make sure it doesn't return a 200 + blank page?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure- added some assertions for the presence/values of the top items we expect

@@ -84,3 +91,26 @@ class ExperimentKeyValueViewSet(viewsets.ModelViewSet):
permission_classes = (IsStaffOrReadOnly,)
queryset = ExperimentKeyValue.objects.all()
serializer_class = serializers.ExperimentKeyValueSerializer


class UserMetaDataView(APIView):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed over Slack - might be worth seeing how other endpoints guard access (maybe there's already a permission / method / pre-access-hook / etc.) so we can follow the same pattern.

Copy link
Contributor Author

@dianekaplan dianekaplan Nov 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The closest-looking permission had been IsStaffOrOwner, but that's used by views that take a viewsets.ModelViewSet object, whereas ours takes an APIView (so we don't have an 'action' attribute that the permission expects). The codebase had several different 'setups' for doing API calls needing different things. It's my understanding from @MatthewPiatetsky that using the ModelViewSet is more geared toward standard crud actions, so the API acts like a database query, returning the specified entry from the database. For this API though, we're doing some processing and getting data from different places, for a get request. So we could probably try to make it work for ModelViewSet but it makes more sense without. (Also IsStaffOrOwner gives permissions to create, while the permission we added was just for reading).


class IsStaffOrReadOnlyForSelf(BasePermission):
"""
Grants access to staff or to user reading info about their own user
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need staff access if Optimizely makes the request on behalf of the user and any user can access their own data via the new endpoint, including staff users?

Copy link
Contributor Author

@dianekaplan dianekaplan Nov 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optimizely itself won't need this, but my thinking is:
(a) this is helpful for testing and troubleshooting if we want to call directly to look at results for different users
(b) the neighboring permission class also is used by either a regular user (with one level of access) or admins (with increased access)

@dianekaplan dianekaplan force-pushed the REV-1564_user_metadata_api branch from 6273c0b to 69649b6 Compare November 2, 2020 21:18
@dianekaplan dianekaplan force-pushed the REV-1564_user_metadata_api branch from a7ee8cb to 0e0c223 Compare November 4, 2020 15:46
@dianekaplan
Copy link
Contributor Author

jenkins run js

@dianekaplan
Copy link
Contributor Author

jenkins run py38 js

@dianekaplan dianekaplan merged commit 103ec9e into master Nov 4, 2020
@dianekaplan dianekaplan deleted the REV-1564_user_metadata_api branch November 4, 2020 16:21
@edx-pipeline-bot
Copy link
Contributor

EdX Release Notice: This PR has been deployed to the staging environment in preparation for a release to production.

@edx-pipeline-bot
Copy link
Contributor

EdX Release Notice: This PR has been deployed to the production environment.

@edx-status-bot
Copy link

Your PR has finished running tests. The following contexts failed:

  • jenkins/python

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants