Workflow 2.0: Performance Issues #34968
Replies: 3 comments 2 replies
-
👋 Wondering how this project is going? Here’s an update on what the Performance team has been working on in the past month to bring first-class performance issues into Sentry. First performance issueWe’ve started working on a Proof of Concept (POC) that will target a specific type of performance issues - extraneous spans. What is an extraneous span? It’s a term we’ve been using internally to describe a span in which occurrence and cumulative duration within a transaction exceed a pre-defined threshold. Here’s an example of extraneous spans that we’ve found in our codebase: These kinds of performance problems are tied to code and, therefore, are more actionable which makes them a good starting point for the POC. Auto-detectionTo quickly test the initial idea, we’ll create a special beta release of the Python SDK capable of detecting extraneous spans. When a problematic span is found an exception will be thrown, and a corresponding error will be created and sent to Sentry. This approach will allow us to collect stack traces and any other performance-specific information. This will be an internal POC and will help us validate the initial approach and get early feedback before we take on bigger changes to our backend. We've chosen the SDK-side detection just for the POC and this won't be a part of the long-term product strategy. PRs to check out What’s nextNow that we’ve determined the type of performance issue we will tackle first and how we will find it, let’s see how it’ll be displayed to users. This is a proposed design of a new performance-specific issue detail page: This view is meant to provide enough information on the severity of the performance problem, its effect on key metrics like transaction duration and a location in the code where the problem is originating from. Please note these designs aren’t final yet and are subject to change as we progress through the POC. Actionability Since extraneous spans are a code-related performance problem, displaying a stack trace is key here. It’ll allow users to quickly locate the issue in the code and fix it. In addition, we will be displaying a span tree with all repeated spans to further demonstrate how they’re affecting an event. FeedbackWe’d love to hear from you. What do you think of the proposed first performance issue? Do you have any ideas/comments/concerns we should consider? Please let us know in this conversation. |
Beta Was this translation helpful? Give feedback.
-
Is there an option to filter these out of the issues view? I like to have particular pre-filtered bookmarks that I go to, and would like to not have all of the N+1 queries listed in all of those views. Otherwise, it is looking great! |
Beta Was this translation helpful? Give feedback.
-
As of October 17th, Performance Issues are now available to every organization that sends transactions to Sentry 🎉! N+1 database query detection is available out of the box and doesn’t require any additional setup. After the original Python SDK experiment, we saw how much value Performance Issues can bring to our users. Developers want that same level of context and actionability when it comes to performance problems. Most importantly, developers want answers and not just dashboards. This is why we’ve launched Performance Issues - to bring the actionability that was once only reserved for errors to Performance. What is Performance Issues?Performance Issues represent performance problems in applications. Just like regular Issues, they capture and group unique problems together and provide actionable context to developers. Issues are no longer bound to errors. “Issues” are becoming a domain-agnostic platform that can support multiple kinds of regressions and this launch is a first step towards making that vision a reality. We’ve solicited ideas from the community for a good candidate for the first performance issue type that would bring the most value for developers. The consensus was that we should start with N+1 database queries. N+1 queries are common performance problems in ORM tools like Django or Ruby on Rails. They occur when, after a single initial query (the +1), each row in the results from that query then spawns another query (the N). This pattern is common in the parent-child relationships. N+1 queries can be hard to detect when the number of executed queries is low but as that numbers grows it can overwhelm your database and take down your application. Here is an example of an N+1 query problem we found in our backend: As you can see, the same query was executed 15 times which took around 380ms. This problem can be resolved by using Django’s Where can I see Performance Issues?Performance Issues are displayed in the issues feed just like regular issues. You can also search for them by applying an How does detection work?Each incoming indexed transaction event is run through a “performance detector” where we check the quantity of “db” spans and their cumulative duration. If both count and duration exceed the threshold then the detector outputs all found performance problems with their corresponding fingerprints. A performance fingerprint uses data from the problematic spans that were detected; specifically the span class, op and parsed description. After the detection step, the output is used to either create a new performance issue or to update an existing one. It’s important to note that a single performance issue can capture a problem that occurs across multiple transactions. This relationship is representative of the nature of performance problems where they have the same root cause but can affect different parts of the application. Performance Issues vs Error IssuesPerformance Issues are backed by transaction events only. Events in the context of performance issues refer to transactions, not errors. Performance Issues don’t affect your error quota. The Issue Details page looks slightly different too. For Performance Issues, a stack trace is replaced with a condensed span tree. There is also a new section for the span evidence that is meant to supplement users with more context as to where the issue occurred. Similarly to an Error Issue though, Performance Issues can be assigned, ignored, resolved, and searched for in the issues feed. You can also filter and prioritize by event count or number of users impacted. Plus, Performance Issues works with the issue tracking tools like Github, Jira, and Asana, so you can create or link an issue directly from Sentry – allowing everyone to stay on top of the issue status. Email & Slack alerts are also available for Performance Issues. Helpful ResourcesEA announcement FeedbackAs always, we’d love to hear from you. Has Sentry helped you find and fix any N+1 db queries in your applications? Is there anything you want to see on the Issue Details page that'd help you diagnose and fix problems faster? Please let us know in this conversation. |
Beta Was this translation helpful? Give feedback.
-
It’s been a couple of weeks since @dcramer's initial discussion on Workflow 2.0. As a reminder, our focus is on addressing these two key areas:
We’ve identified a few paths to those outcomes and we are opening the discussion to the community. This conversation is focused on performance issues, but we’d love to hear from you on issue grouping and issue notifications as well.
Where We Are
Solving performance problems using Sentry can be challenging. Going from observation to root cause involves investigation, twiddling with many views/filters and eventually discovering that you can or cannot do anything.
Making This Better
Sentry will start serving up issues related to performance problems in your application. We will automatically detect a performance problem and actually guide you on how to potentially fix it in your application.
Performance Issues will be fixable by code change
Many performance problems are the result of external factors unrelated to code, which is outside of a developer’s control. We are focussing on a class of performance problems that we believe are actionable in code as our first priority, so we don’t create “unactionable” issues.
Guide developers to the solution fast
These issues will contain very performance-specific troubleshooting tools and data while retaining a workflow that is familiar with Issues today such as assigning, ignoring and resolving. Issues become “here’s how you fix something broken” rather than “here’s how you fix this exception”.
Again, we want to hear from you. Do any of the proposed solutions feel like they'd be helpful? Do you have other ideas we should consider? Please let us know in this conversation.
Beta Was this translation helpful? Give feedback.
All reactions