Skip to content
This repository has been archived by the owner on Aug 30, 2023. It is now read-only.

draft about distinct id implementation #342

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from
Draft

Conversation

marandaneto
Copy link
Contributor

@marandaneto marandaneto commented Apr 6, 2020

📢 Type of change

  • Bugfix
  • New feature
  • Enhancement
  • Refactoring

📜 Description

draft about distinct id implementation

💡 Motivation and Context

distinctId should be changeable, but still keeping compatibility with older SDKs.
secureAndroidId should be deprecated, but still possible to keep compatibility.

💚 How did you test it?

no tests at all, only a draft PR to support my DACI request.

📝 Checklist

  • I reviewed submitted code
  • I added tests to verify changes
  • All tests passing

🔮 Next steps

get approval and do the real implementation.

import org.jetbrains.annotations.NotNull;
import org.jetbrains.annotations.Nullable;

// TODO: to think, this could be a SentryOptionsCache that could be expanded, does it make sense?
Copy link
Contributor Author

@marandaneto marandaneto Apr 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to discuss, related to the naming and scope of this class (IUserCache/AndroidUserCache)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me to keep this focused on user caching, specially the abstraction

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if Cache is the right naming here, because we actually persist the data to the SharedPreferences. A cache is usually used to improve the performance of something, see CPU caches, browser caches, web server caches. For me it is more like a UserRepo or UserStorage. What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not entirely, I borrowed the idea from https://developer.android.com/reference/android/content/Context.html#getCacheDir()

I see the Caching term as something temporary, being in memory or in the disk.
Also, https://docs.sentry.io/clients/java/integrations/#android (old SDK version)

Events will be buffered to disk (in the application’s cache directory) by default

I'm fine changing it as well, but it might make it inconsistent with other SDKs.

@bruno-garcia opnions?

Comment on lines +15 to +19
// TODO: only these 3 fields make sense to me, otherwise caching a raw json would be better, but I
// don't want to do that.
private static final String EMAIL = "email";
private static final String ID = "id";
private static final String USERNAME = "user_name";
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to discuss, for sessions, we only need these 3 fields

Comment on lines +35 to +36
// TODO: I believe this makes sense only for global hub SDKs, but if not,
// how do we cache multiple users (because we have multiple scopes)?
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to discuss

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me it makes sense to keep this class only specific to Android. I guess we are not going to run into the problem of caching multiple users.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, this one is Android-specific, but we can't forget that sentry-core is cross-platform.
nothing to do here for now.

Comment on lines +79 to +81
// is it a good idea to do it here? this guarantees that it cleans up on restart if the flag
// has been swapped
cleanUserFields(edit);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to discuss

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense, because otherwise you could receive an old user. Imagine the following scenario:

  1. setUser(A)
  2. cacheUserForSessions = false
  3. setUser(B), which doesn't do anything.
  4. cacheUserForSessions = true
  5. getUser() returns A

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, that was exactly the use-case

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just came up with another scenario:

  1. setUser(A)
  2. cacheUserForSessions = false
  3. cacheUserForSessions = true
  4. getUser() returns A

In this scenario, even with calling cleanUserFields in setUser and getUser, you get the old user A. SentryOptions.setCacheUserForSessions could clear the UserCache if set to false to solve this problem.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I'd say on SDK. init, if the flag is false, we should clean this up.
SentryOptions fields are only mutable on SDK init., but not on runtime.
good catch

Comment on lines +430 to +436
// fallback to sentryDeviceId
if (distinctId == null) {
// if I add a transient field (not serializable) to the User object, I can get rid of the
// distinctId on options
// I cannot simply set on user.Id cus its gonna be a breaking change
distinctId = options.getDistinctId();
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to discuss

Comment on lines +418 to +419
final User finalUser = getUser();
// synchronized (userLock) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's racey if we don't sync it. but as session is already locked, only a copy of User would be fine.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically user would need to be readonly but yeah


String sentryDeviceId;
// cache it if not set
if (!sharedPreferences.contains(SENTRY_DEVICE_ID)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the place to have this code on some package private static class? If so we could just call it from here.

I believe this contains call could be replace with the sharedPreferences.getString(SENTRY_DEVICE_ID, null) != null to avoid two I/O calls.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SharedPreferences does in-memory caching sync and it flushes to the disk async., so unless its value has changed, it's not an issue.
I'd separate it in a method or class, yes.

import org.jetbrains.annotations.NotNull;
import org.jetbrains.annotations.Nullable;

// TODO: to think, this could be a SentryOptionsCache that could be expanded, does it make sense?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me to keep this focused on user caching, specially the abstraction

if (options.isCacheUserForSessions()) {
final SharedPreferences.Editor edit = sharedPreferences.edit();
if (user != null) {
edit.putString(ID, user.getId())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we settle for only caching the userId instead? Or at least a single value (when a user is set, we evaluate the which value based on an order (id>username>email)? Would be worth seeing what the issues page already considers when accounting for "user" of a crash.

If we're expecting to store the actual User, we'd need to consider the protocol can change (versioning/new fields added), user define data and makes it trickier:

All other keys are stored as extra information but not specifically processed by Sentry.

If we're storing just a single value to be used for session tracking only, it would minimize these edge cases.
Everything else you said stands true:

setUser(...) makes the SDK evaluate a new value for session distrinctId. setUser(null) deletes that cached value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, a single value would be ok, makes sense.
the order would be id > email > username if following the release health specs.

/**
* enable the usage of Settings.Secure.ANDROID_ID to keep retro compatible with older SDK versions
*/
private boolean enableSecureAndroidId;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this option be named in such as way as to indicate a fallback (as opposed to enable which sounds like a feature).
Perhaps useLegacyUserId or something else?

Also, should this be guarded by sendDefaultPii? Meaning we'd need both this flag on + sendDefaultPii?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed, naming here can be improved, useLegacyAndroidId maybe? the behavior is very dependent on it.
so initially I've added sendDefaultPii and later removed. as it's the only field to be guarded, it sounds redundant, one would need to swap 2 flags for a single behavior change.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we add it, I would already mark it as deprecated and tell our users to get rid of it.

@@ -102,16 +105,24 @@ public void setTransaction(@Nullable String transaction) {
* @return the user
*/
public @Nullable User getUser() {
return user;
synchronized (userLock) {
if (user == null) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of potential improvements:

  • double-check lock
  • account for when no user is ever set. Now it'll take the lock and check the cache on each use.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've noticed the multiple checks of getUser call.
Ideally, we'd only check it again if the value has been changed.
Definitely improvements to be done.

Comment on lines +418 to +419
final User finalUser = getUser();
// synchronized (userLock) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically user would need to be readonly but yeah

options.getDistinctId(), user, options.getEnvironment(), options.getRelease());
String distinctId = null;

final User finalUser = getUser();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean that now even if the user isn't cached we'll be user the user from the scope?
Seems like this brings us back to a potential double-user count per device issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean that now even if the user isn't cached we'll be user the user from the scope?

mm guess I don't get it, what do you mean?

Seems like this brings us back to a potential double-user count per device issue.

this is true only on the first run of the App., if cacheUserForSessions is enabled and value has been set, it's gonna reuse the same distinctId, unless setUser has been called again with different data.

if (distinctId == null) {
// if I add a transient field (not serializable) to the User object, I can get rid of the
// distinctId on options
// I cannot simply set on user.Id cus its gonna be a breaking change
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment made me wonder if we should just release 3.0.0 with sessions and by default report the installation id as both distinctId and user.id. With that fallback to secureId you added already. Would simplify things?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we do a major bump, I'd remove secureAndroidId once for all.
I'd set installationId by default onto Scope.user.id on SDK init if cacheUserForSessions is disabled and if enabled, the cached userId.

this would make event.user.id and session.distinctId normalized.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense. Also because a btter fallback is to tell users to call secure.AndroidId and set to the user.id by themselves


// TODO: only these 3 fields make sense to me, otherwise caching a raw json would be better, but I
// don't want to do that.
private static final String EMAIL = "email";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we prefix those constants here? Could it be that the users of our SDK also save email to their shared preferences? What about 'sentry_user_cache_email' or something similar?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@philipphofmann philipphofmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a few comments :)

/**
* enable the usage of Settings.Secure.ANDROID_ID to keep retro compatible with older SDK versions
*/
private boolean enableSecureAndroidId;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we add it, I would already mark it as deprecated and tell our users to get rid of it.

Comment on lines +35 to +36
// TODO: I believe this makes sense only for global hub SDKs, but if not,
// how do we cache multiple users (because we have multiple scopes)?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me it makes sense to keep this class only specific to Android. I guess we are not going to run into the problem of caching multiple users.

Comment on lines +43 to +45
edit.putString(ID, user.getId())
.putString(EMAIL, user.getEmail())
.putString(USERNAME, user.getUsername())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me it is a bit confusing why we pass the whole user to IUserCache.setUser and only save id, email and username. Why don't we save the other users properties?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could, but our session guideline only considers id, email, and username as a valid unique identifier.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe then we should create an extra class for this and name it something like SessionIdCache and add this as a comment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the Caching class should not be aware of what id it is, by that I mean (id, email or username).
maybe this class should take a String nominated as distinctId and that's all, so the naming would not matter.
A session also has an id so SessionIdCache would not fit, as we are caching the Session.distinctId which is a unique identifier of the user or device.
UserIdCache would make sense though?! naming is hard hehe

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe this class should take a String nominated as distinctId

I prefer this too. It says what it does. So when setting the user, if this option is enabled, we select the data from the user object (id, username, email) to become the distinctId and cache that so the caching itself isn't aware of this logic and we don't have to deal with caching the actual user object and possible user-defined fields.

import org.jetbrains.annotations.NotNull;
import org.jetbrains.annotations.Nullable;

// TODO: to think, this could be a SentryOptionsCache that could be expanded, does it make sense?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if Cache is the right naming here, because we actually persist the data to the SharedPreferences. A cache is usually used to improve the performance of something, see CPU caches, browser caches, web server caches. For me it is more like a UserRepo or UserStorage. What do you think?


@Override
public void setUser(final @Nullable User user) {
if (options.isCacheUserForSessions()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could separate the logic of whether the user data should be saved or nor from the actual Android code. I would imagine two classes. One called something like UserCache that holds SentryAndroidOptions and SentryDeviceId and this class task to a simple interface like UserPersistance, that has an Android implementation an actually saves the user's data to the SharedPreferences. Then we have the SDK logic separated from persisting the data to Android. It would be easier to test and easier to extend. What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, we could do that, just not sure how deep this abstraction should be, do we really see this user caching thing going forward with other SDKs? like java-backend, java-desktop etc... otherwise, it's just more classes for a single block of code, which is an if.
@bruno-garcia opinions?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be pulled out from the Android code but I believe it would only make sense then in desktop apps, right? Since this User in server apps are usually one per request and doesn't really make sense to cache.
Tbh we're not even sure this feature will even be used, it's just a work around to support distinctId for sessions based on user defined user id.

The proposal for distinctId has an option of not dealing with this at all (using only the generated id) which sounds like a good first version until we can get customer feedback to make a more educated decision.

Comment on lines +79 to +81
// is it a good idea to do it here? this guarantees that it cleans up on restart if the flag
// has been swapped
cleanUserFields(edit);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense, because otherwise you could receive an old user. Imagine the following scenario:

  1. setUser(A)
  2. cacheUserForSessions = false
  3. setUser(B), which doesn't do anything.
  4. cacheUserForSessions = true
  5. getUser() returns A


if (androidId == null
|| androidId.isEmpty()
|| androidId.equalsIgnoreCase("9774d56d682e549c")) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we maybe add a comment about what 9774d56d682e549c is? I had to Google it ;)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ups, copy-pasta mistake.
the original code has a comment which leads to the google blogpost
see

// https://android-developers.googleblog.com/2011/03/identifying-app-installations.html

if (finalUser != null) {
if (finalUser.getId() != null && !finalUser.getId().isEmpty()) {
distinctId = finalUser.getId();
} else if (finalUser.getEmail() != null && !finalUser.getEmail().isEmpty()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we use the email or the username as the distinctId if the id is not set?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because all of them are considered to be a unique identifier, in this order.
if there's no id, check for email, no email as well? username.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was the approach taken in the session supported added in the python SDK at least.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for clarifying that 👍. Didn't know

@marandaneto
Copy link
Contributor Author

marandaneto commented Apr 7, 2020

I added a few comments :)

thanks for the review :)
just emphasizing that this Draft PR is only to support the DACI, not a final implementation.
I've replied all of them o/

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants