Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(dynamic-sampling): Implement prioritize by project bias [TET-574] #42939

Merged

Conversation

andriisoldatenko
Copy link
Contributor

@andriisoldatenko andriisoldatenko commented Jan 9, 2023

This PR implements prioritize by project bias.

In detail:
We run celery task every 24 at 8:00AM (UTC randomly selected) for every ORG (we call it prioritise by project snuba query ) and all projects inside this org, and for a given combination of org and projects run an adjustment model to recalculate sample rates if necessary.

Then we cache sample rate using redis cluster -> SENTRY_DYNAMIC_SAMPLING_RULES_REDIS_CLUSTER using this pattern for key: f"ds::o:{org_id}:p:{project_id}:prioritise_projects".

When relay fetches projectconfig endpoint we run generate_rules functions to generate all dynamic sampling biases, so and we check if we have adjusted sample rate for this project in the cache, so we apply it as uniform bias, otherwise we use default one.

Regarding prioritize by project snuba query is cross org snuba query that utilizes a new generic counter metric, which was introduced in relay c:transactions/count_per_root_project@none.

TODO:

  • Provision infrastructure to run clickhouse clusters for the counters tables. This is primarily dependent on ops
  • Start running the snuba consumers to read and write to the counters table. SnS can work on this
  • Add unit-tests;
  • Update snuba query using new metric
  • Hide behind feature flag

related PRs:

@github-actions github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label Jan 9, 2023
@andriisoldatenko andriisoldatenko changed the title feat(sampling): Implement prioritize by project bias [TET-569] feat(dynamic-sampling): Implement prioritize by project bias [TET-569] Jan 9, 2023
@github-actions github-actions bot added the Scope: Frontend Automatically applied to PRs that change frontend components label Jan 16, 2023
@github-actions
Copy link
Contributor

🚨 Warning: This pull request contains Frontend and Backend changes!

It's discouraged to make changes to Sentry's Frontend and Backend in a single pull request. The Frontend and Backend are not atomically deployed. If the changes are interdependent of each other, they must be separated into two pull requests and be made forward or backwards compatible, such that the Backend or Frontend can be safely deployed independently.

Have questions? Please ask in the #discuss-dev-infra channel.

@andriisoldatenko andriisoldatenko force-pushed the andrii/implement-cronjob-to-fetch-orgs-projects branch from 77f6033 to b074f9a Compare January 16, 2023 08:10
@andriisoldatenko andriisoldatenko force-pushed the andrii/implement-cronjob-to-fetch-orgs-projects branch from 8532860 to 6071553 Compare January 22, 2023 16:55
@andriisoldatenko andriisoldatenko force-pushed the andrii/implement-cronjob-to-fetch-orgs-projects branch from 6071553 to e4ed330 Compare January 26, 2023 17:01
@andriisoldatenko andriisoldatenko force-pushed the andrii/implement-cronjob-to-fetch-orgs-projects branch from b81e3bd to 955c3bf Compare February 7, 2023 12:44
@andriisoldatenko andriisoldatenko force-pushed the andrii/implement-cronjob-to-fetch-orgs-projects branch from 955c3bf to 1c0bbcf Compare February 12, 2023 17:36
@andriisoldatenko
Copy link
Contributor Author

Tested locally with 1 org 1 project - no changes:

15:33:18 worker                                | 15:33:18 [INFO] sentry.dynamic_sampling: rules_generator.generate_rules (org_id=1 project_id=1 rules=[{'id': 1002, 'type': 'ignoreHealthChecks', 'samplingValue': {'type': 'sampleRate', 'value': 0.05}, 'healthChecks': ['*healthcheck*', '*healthy*', '*live*', '*ready*', '*heartbeat*', '*/health', '*/healthz']}, {'id': 1000, 'type': 'uniformRule', 'samplingValue': {'type': 'sampleRate', 'value': 0.25}}])

With 1 org and 2 projects (blended rate - 0.25):

16:18:01 worker                                | 16:18:01 [INFO] sentry: monitor.missed-checkin (monitor_id=4)
16:18:01 worker                                | 16:18:01 [INFO] sentry: monitor.missed-checkin (monitor_id=1)
16:18:01 worker                                | 16:18:01 [INFO] sentry.auth: !!!!!!!!!!!!!!!!!!!!!!!!!!check_auth
16:18:01 worker                                | 16:18:01 [INFO] sentry.dynamic_sampling.tasks: !!! start prioritise_projects
16:18:01 worker                                | 16:18:01 [INFO] sentry.dynamic_sampling.tasks: !!! 1 [(1, 1452.0), (8, 100.0)]
16:18:01 worker                                | 16:18:01 [INFO] sentry.dynamic_sampling.tasks: !!! start process_projects_sample_rates
16:18:01 worker                                | 16:18:01 [WARNING] sentry.tasks.release_registry: Release registry URL is not specified, skipping the task.
16:18:04 worker                                | 16:18:04 [INFO] sentry.tasks.groupowner: process_suspect_commits.skipped (reason='no_release')
16:18:04 worker                                | 16:18:04 [INFO] sentry.tasks.groupowner: process_suspect_commits.skipped (reason='no_release')
16:18:07 worker                                | 16:18:07 [INFO] sentry.dynamic_sampling: rules_generator.generate_rules (org_id=1 project_id=1 rules=[{'id': 1002, 'type': 'ignoreHealthChecks', 'samplingValue': {'type': 'sampleRate', 'value': 0.03966942148760331}, 'healthChecks': ['*healthcheck*', '*healthy*', '*live*', '*ready*', '*heartbeat*', '*/health', '*/healthz']}, {'id': 1000, 'type': 'uniformRule', 'samplingValue': {'type': 'sampleRate', 'value': 0.19834710743801653}}])

],
groupby=[Column("org_id"), Column("project_id")],
where=[
Condition(Function("modulo", [Column("org_id"), 100]), Op.LT, sample_rate),
Copy link
Contributor

@onewland onewland Feb 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might work fine but I'm not sure how efficient WHERE org_id % 100 < 45 (or some other n) will be at scanning the table.

@nikhars do you have an opinion on this? I think given the data size and the filtering by metric_id it would probably work but I wonder if they'd get better ClickHouse performance if they enumerated the org_id's to check into batches of 5k or 10k and filtering before sending the query

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can run the query and provide information whether this sort of WHERE clause would be helpful or not.

@andriisoldatenko andriisoldatenko merged commit 41ff6d0 into master Feb 28, 2023
@andriisoldatenko andriisoldatenko deleted the andrii/implement-cronjob-to-fetch-orgs-projects branch February 28, 2023 13:50
jan-auer added a commit that referenced this pull request Feb 28, 2023
* master: (79 commits)
  feat(perf-issues): Add performance issue detection timing runner command (#44912)
  Revert "chore: Investigating org slug already set to a different value (#45134)"
  fix(hybrid-cloud): Redirect to org restoration page for customer domains (#45159)
  bug(replays): Fix 500 error when marshaling tags field (#45097)
  ref(sourcemaps): Redesign lookup of source and sourcemaps (#45032)
  chore: Investigating org slug already set to a different value (#45134)
  feat(dynamic-sampling): Implement prioritize by project bias [TET-574] (#42939)
  feat(dynamic-sampling): Add transaction name prioritize option - (#45034)
  feat(dyn-sampling): add new bias toggle to project details for prioritise by tx name [TET-717] (#44944)
  feat(admin) Add admin relay project config view [TET-509] (#45120)
  Revert "chore(assignment): Add analytics when autoassigning after a manual assignment (#45099)"
  feat(sourcemaps): Implement new tables supporting debug ids (#44572)
  ref(js): Remove usage of react-document-title (#45170)
  chore(py): Consistently name urls using `organization-` prefix (#45180)
  ref: rename acceptance required checks collector (#45156)
  chore(assignment): Add analytics when autoassigning after a manual assignment (#45099)
  feat(source-maps): Update copy for source map debug alerts (#45164)
  ref(js): Remove custom usage of DocumentTitle (#45165)
  chore(login): update the login banners (#45151)
  ref(py): Remove one more legacy project_id from Environment (#45160)
  ...
@github-actions github-actions bot locked and limited conversation to collaborators Mar 16, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Scope: Backend Automatically applied to PRs that change backend components Scope: Frontend Automatically applied to PRs that change frontend components
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants