Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(spans): Scrub random strings in resource spans #2614

Merged
merged 17 commits into from
Oct 19, 2023

Conversation

jjbayer
Copy link
Member

@jjbayer jjbayer commented Oct 17, 2023

These PR attempts to sanitize some flaws in resource span scrubbing:

  1. chrome-extension:// domains are random strings, scrub those.
  2. Keep the schema, but scrub subdomains (instead of cdn.domain.com, write https://*.domain.com).
  3. Replace path segments with special characters (=, %, ...) with *
  4. If a path segment has more than 25 characters after regex scrubbing, assume it is an identifier and replace with *.
  5. If a path segment has only alphabetic characters and contains uppercase characters, assume it is an identifier and replace with *.

See test cases for examples.

@jjbayer jjbayer changed the base branch from master to ref/spans-no-clustering October 17, 2023 16:19
Base automatically changed from ref/spans-no-clustering to master October 18, 2023 07:45
@jjbayer jjbayer marked this pull request as ready for review October 19, 2023 08:15
@jjbayer jjbayer requested review from a team, phacops and DominikB2014 October 19, 2023 08:15
Co-authored-by: Oleksandr <1931331+olksdr@users.noreply.github.com>
resource_script_random_path_only,
"/ERs-sUsu3/wd4/LyMTWg/Ot1Om4m8cu3p7a/QkJWAQ/FSYL/GBlxb3kB",
"resource.script",
"/*/*/*/*/*/*/*"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this string valuable?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, but the complexity of this PR already ballooned, so I would like to keep it as-is. If it turns out we produce high cardinality because of variable length /*, /*/*, ..., we can always reconsider.

Co-authored-by: Iker Barriocanal <32816711+iker-barriocanal@users.noreply.github.com>
@jjbayer jjbayer enabled auto-merge (squash) October 19, 2023 12:59
@jjbayer jjbayer merged commit 0602533 into master Oct 19, 2023
20 checks passed
@jjbayer jjbayer deleted the feat/spans-resource-random-strings branch October 19, 2023 14:02
@DominikB2014
Copy link
Contributor

👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants