-
-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(dynamic-sampling): Implement prioritize by project bias [TET-574] #42939
feat(dynamic-sampling): Implement prioritize by project bias [TET-574] #42939
Conversation
🚨 Warning: This pull request contains Frontend and Backend changes! It's discouraged to make changes to Sentry's Frontend and Backend in a single pull request. The Frontend and Backend are not atomically deployed. If the changes are interdependent of each other, they must be separated into two pull requests and be made forward or backwards compatible, such that the Backend or Frontend can be safely deployed independently. Have questions? Please ask in the |
77f6033
to
b074f9a
Compare
8532860
to
6071553
Compare
6071553
to
e4ed330
Compare
b81e3bd
to
955c3bf
Compare
955c3bf
to
1c0bbcf
Compare
38888b4
to
365f3d4
Compare
Tested locally with 1 org 1 project - no changes:
With 1 org and 2 projects (blended rate - 0.25):
|
], | ||
groupby=[Column("org_id"), Column("project_id")], | ||
where=[ | ||
Condition(Function("modulo", [Column("org_id"), 100]), Op.LT, sample_rate), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might work fine but I'm not sure how efficient WHERE org_id % 100 < 45
(or some other n
) will be at scanning the table.
@nikhars do you have an opinion on this? I think given the data size and the filtering by metric_id it would probably work but I wonder if they'd get better ClickHouse performance if they enumerated the org_id's to check into batches of 5k or 10k and filtering before sending the query
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can run the query and provide information whether this sort of WHERE
clause would be helpful or not.
* master: (79 commits) feat(perf-issues): Add performance issue detection timing runner command (#44912) Revert "chore: Investigating org slug already set to a different value (#45134)" fix(hybrid-cloud): Redirect to org restoration page for customer domains (#45159) bug(replays): Fix 500 error when marshaling tags field (#45097) ref(sourcemaps): Redesign lookup of source and sourcemaps (#45032) chore: Investigating org slug already set to a different value (#45134) feat(dynamic-sampling): Implement prioritize by project bias [TET-574] (#42939) feat(dynamic-sampling): Add transaction name prioritize option - (#45034) feat(dyn-sampling): add new bias toggle to project details for prioritise by tx name [TET-717] (#44944) feat(admin) Add admin relay project config view [TET-509] (#45120) Revert "chore(assignment): Add analytics when autoassigning after a manual assignment (#45099)" feat(sourcemaps): Implement new tables supporting debug ids (#44572) ref(js): Remove usage of react-document-title (#45170) chore(py): Consistently name urls using `organization-` prefix (#45180) ref: rename acceptance required checks collector (#45156) chore(assignment): Add analytics when autoassigning after a manual assignment (#45099) feat(source-maps): Update copy for source map debug alerts (#45164) ref(js): Remove custom usage of DocumentTitle (#45165) chore(login): update the login banners (#45151) ref(py): Remove one more legacy project_id from Environment (#45160) ...
This PR implements prioritize by project bias.
In detail:
We run celery task every 24 at 8:00AM (UTC randomly selected) for every ORG (we call it prioritise by project snuba query ) and all projects inside this org, and for a given combination of org and projects run an adjustment model to recalculate sample rates if necessary.
Then we cache sample rate using redis cluster ->
SENTRY_DYNAMIC_SAMPLING_RULES_REDIS_CLUSTER
using this pattern for key:f"ds::o:{org_id}:p:{project_id}:prioritise_projects"
.When relay fetches
projectconfig
endpoint we rungenerate_rules
functions to generate all dynamic sampling biases, so and we check if we have adjusted sample rate for this project in the cache, so we apply it as uniform bias, otherwise we use default one.Regarding prioritize by project snuba query is cross org snuba query that utilizes a new generic counter metric, which was introduced in relay
c:transactions/count_per_root_project@none
.TODO:
related PRs: