Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Remove sampling #14659

Merged
merged 2 commits into from
Sep 27, 2019
Merged

feat: Remove sampling #14659

merged 2 commits into from
Sep 27, 2019

Conversation

lynnagara
Copy link
Member

@lynnagara lynnagara commented Sep 10, 2019

Since moving event storage to Snuba the sampling feature has no longer worked as expected. Unlike Postgres, all events are stored in Snuba. Since Snuba is now the source of truth for all event data, sampling settings are effectively being ignored.

This PR doesn't change anything for SaaS users since the sampling feature has been turned off on sentry.io for some time.

Open source users (on 10.x) that had sampling turned on will have previously found that events that should have been sampled would have been present (however they may have been missing some data as the sampled event wasn't being saved to node storage).

#14627 and #14829 should be merged first

@lynnagara lynnagara requested a review from a team September 13, 2019 20:46
Copy link
Contributor

@fpacifici fpacifici left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While this change is small in term of lines there are a lot of moving parts and potentially hidden dependencies on the is_sample field. These are specifically in post_process and eventstream since they are decoupled from the code that runs sentry.
This makes it hard to assess if the code is safe and it would make it complex to ensure everything works properly in prod.

I would advise to split this change:

  • address post_process first, make it agnostic to the is_sample field by removing all dependenceis. Deploy and verity
  • same for eventstream
  • only when you have verified they are both in a good state, change event_manager and remove the field form there.

@@ -113,7 +113,7 @@ def handle_owner_assignment(project, group, event):


@instrumented_task(name="sentry.tasks.post_process.post_process_group")
def post_process_group(event, is_new, is_regression, is_sample, is_new_group_environment, **kwargs):
def post_process_group(event, is_new, is_regression, is_new_group_environment, **kwargs):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not 100% confident on how parameters are passed to celery tasks and how we deploy them, but during the deployment one of these things will happen:

  1. sentry goes first and starts enqueuing tasks with a missing parameter. If the worker picks up that task with the old code it would fail.
  2. the worker goes first and we will pass a parameter that does not exists. If it is passed positionally the tasks will fail.

Could you please validate what our best practices are for changing the interface of a celery task?
To be conservative you would always be able to add a default value for this function.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pulled out the post process changes into #14758 like you suggested.

We want 2 to happen - worker goes first. Since this only gets called with keyword arguments, the tasks should succeed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"We want 2 to happen". yes but is it the way it happens in prod today ?

@@ -195,11 +195,7 @@ def post_process_group(event, is_new, is_regression, is_sample, is_new_group_env

for plugin in plugins.for_project(event.project):
plugin_post_process_group(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure this behavior is backward compatible? is_sample was passed to the plugin. Are we sure it was never used by anyone ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it's only expected by the Jira plugin. This change needs to go first getsentry/sentry-plugins#513

Copy link
Contributor

@fpacifici fpacifici left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good. Thanks

lynnagara added a commit that referenced this pull request Sep 27, 2019
Once #14659 is merged we can
remove the remaining unused parts of the sampling related code.
@lynnagara lynnagara merged commit 3ee8de5 into master Sep 27, 2019
@lynnagara lynnagara deleted the remove-sampling branch September 27, 2019 20:48
@github-actions github-actions bot locked and limited conversation to collaborators Dec 20, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants