Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Created GitHub Markdown documentation #36

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
51 changes: 51 additions & 0 deletions Docs/01_background.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
# sp_whoisactive: A Brief History of Activity Monitoring

------
[Home](https://github.com/amachanic/sp_whoisactive) [Download](https://github.com/amachanic/sp_whoisactive/archive/master.zip) [Documentation Index](ReadMe.md)
------
Next: [Design Philosophy](02_design.md)
------

"Hey DBA! Why is the application so slow?"

"Hey DBA! Why is my query taking, like, forever to return the results?"

"Hey DBA! Something is broken! Fix it, quick!"

If you've been a DBA for 10 minutes or more, you've no doubt heard all of the above statements and every possible transformation of the above statements. A big part of every DBA's job is to understand what's going on when things misbehave. And real-time activity monitoring plays a big part in the quest for insight: if something is broken right now, we need to understand exactly what components are malfunctioning and use that data to quickly make decisions about which course of action to take. **The worst thing a DBA can do is to base a decision on an uninformed guess**.

In order to avoid guesswork, a monitoring solution must provide plenty of data. Data about the component that's malfunctioning, data about what else is running on the system, and if possible contextual data to help the DBA understand how the system got into the state it's in. That's a lot of data, and over the course of the history of SQL Server the level of availability for this kind of information has ranged from totally unexposed (SQL Server 2000 and earlier) to exposed but difficult to access (SQL Server 2005 and beyond).

### SQL Server 7.0 and SQL Server 2000: Squinting Through the Fog

Back in the bad old days of Enterprise Manager, if you wanted to know what was running on your instance you could right-click and select "Current Activity." If Enterprise Manager didn't lock up or take some other completely unpredictable route, the user was rewarded with a list of server processes identifiers, a terse column called "command" that gave a very general indication of what each of those processes was up to, and some basic metrics--CPU, reads, writes, etc--that were known to be woefully inaccurate in most cases.

More advanced users quickly learned to avoid Enterprise Manager altogether. The same information could be gleaned within Query Analyzer by using the sp_who or sp_who2 procedures, or by querying the sysprocesses view. While using Query Analyzer made data collection faster than it was from the Enterprise Manager user interface, the information was still of the same level of quality--or lack thereof. The screen shot below illustrates the state of the art information shown by these tools. **Session 54 seems to be kind of busy**, I guess? I wonder what it’s up to? Well, since I have no clue and the server is slow, **I should probably just kill it.**

![F1_01_sp_who2](image/F1_01_sp_who2.jpg)

Users who were geeky enough to read internals books knew how to get just a bit more information about what the offending session was doing—sort of. The DBCC INPUTBUFFER command would return information about the most recent SQL batch that had been submitted to the server on behalf of the request. This is much better than simply seeing that session 54 is doing some kind of select, but it’s also quite limiting. The SQL submitted might have been a non-parameterized ad hoc batch, in which case seeing what was happening was easy. But as more and more developers learned to use stored procedures, DBCC INPUTBUFFER often returned something as simple as “EXEC SomeStoredProcedure” – and if the procedure had been called via RPC, it wasn’t even possible to see the parameters that were passed in. (And, I should add, it’s still not possible now. A lot has changed in 13 years, but not enough.)

In order to gain more visibility into what was going on on the server, **many DBAs forgot about these commands altogether** and employed another tool that shipped with SQL Server: Profiler. Most of the DBAs I worked with in the late '90s and early '00s kept Profiler open and attached to the various production instances, all day long. Various information would constantly scroll by and, if there was a problem, the stream could be stopped and the DBA could scroll up and down and try to figure out exactly what the situation was. This technique had its plusses and minuses, to be certain: Profiler showed a lot more information—enough to actually figure out the problem in many cases. But it also showed a lot more information—enough to be overwhelming in many cases. And then there was the fact that Profiler could slow down your entire instance of SQL Server. Which was especially problematic when a whole team of DBAs were all working on the same server, and all had Profiler attached and streaming information.

### SQL Server 2005: Information Overload

Clearly, the monitoring situation in SQL Server 2000 was pretty bad. And luckily, Microsoft got the memo. SQL Server 2005 shipped with a set of new monitoring objects called Dynamic Management Views (DMVs). These objects returned a huge amount of information that had never been available in the SQL Server 2000 system views. Many situations that had previously been possible to debug only by getting information from Profiler or a server-side trace were suddenly possible to deal with by running a few SQL queries. The only problem was that for a long time, **no one seemed to know exactly how to write the correct queries**. With scores of DMVs, each with scores of columns, things were overwhelming, and busy DBAs simply didn’t have the time to properly adapt.

Even Microsoft didn’t seem to be able to leverage these powerful new views. The screen shot below is from SQL Server 2008’s Activity Monitor. The 2008 UI is a lot sharper than the SQL Server 2000 Current Activity UI, and more data is returned by Activity Monitor, but the situation is basically the same as it ever was. **Session 54 is still chugging away, doing, well... something.**

![F1_02_Activity_Monitor](image/F1_02_Activity_Monitor.jpg)

To be fair, I can now right-click on any of these rows and find out what SQL is being run by this session. But the user experience is still not even close to user-friendly, and Activity Monitor has a large number of bugs and strange behaviors. Why does it auto-refresh every five seconds? What if I was looking at something? And why do I see a million rows for session 54? There was only one request, wasn’t there?

Even today, in 2011, **many DBAs I talk to are still using sp_who and sp_who2**. A lot of them are still relying on streaming information from SQL Server Profiler. And while people have finally learned to leverage the DMVs, they often use small, standalone ad hoc scripts and query one or two DMVs to find a very specific bit of information. We’ve progressed from a situation where there is not enough information, to a situation where there are too many places to go for the information that really matters. A lateral move at best.

### A Personal Journey

In 2007 I decided to get serious about the DMVs, and I began working on a script to help with monitoring. The first version was posted to my blog on December 31, 2007.

Now, years later, I have taken that script through countless iterations and I’ve learned a tremendous amount about the various DMVs along the way. The [Who is Active](https://github.com/amachanic/sp_whoisactive/releases) stored procedure correlates a large amount of data from 15 of the DMVs, to allow DBAs to get a complete picture when doing real-time activity monitoring. Although my stored procedure has been well-received and is designed to make it easy to get information from the DMVs, it has a large number of options and a few quirks. So it's no surprise that **I have received numerous requests for in-depth documentation**. That’s the point of this blog series: Over the course of the month I will take you through every corner of Who is Active. I will explain how I use it to do troubleshooting on a daily basis, and I'll give you some insight into how it works and why.

------
Next: [Design Philosophy](02_design.md)
------
63 changes: 63 additions & 0 deletions Docs/02_design.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
# sp_whoisactive: Design Philosophy

------
[Home](https://github.com/amachanic/sp_whoisactive) [Download](https://github.com/amachanic/sp_whoisactive/archive/master.zip) [Documentation Index](ReadMe.md)
------
Prior: [A Brief History of Activity Monitoring](01_background.md) Next: [The License](03_license.md)
------

As mentioned in [the background article](01_background.md), I have been working on Who is Active for several years. At first it was a standalone script that I would run in an ad hoc manner when I needed some information, but after a short time it became clear that it made a lot of sense to package it up as a stored procedure.

As time progressed I began adding more and more features on top of the basic functionality, and not surprisingly, the code quickly became extremely complex. In the interest of performance and flexibility I was forced to take what was once a single SQL statement and convert it to utilize dynamic SQL, temporary tables, cursors, error handling, XML, and various other features. Throughout the entire process I’ve attempted to adhere, whenever possible, to a set of basic design principles. These are covered below.

### Show Only Interesting (Relevant) Data

The sp_who* family of stored procedures. Enterprise Manager’s Current Activity screen. Activity Monitor. These tools all have one thing in common that makes them much less useful than they could have been: They show every session that’s connected to the SQL Server instance—whether or not any work is being done. On smaller SQL Server instances this doesn’t matter; you get used to ignoring the various system processes, and figure out where to focus to get the pay dirt—information on what your users are actually up to. But some bigger instances, especially those that back numerous application servers using connection pooling, can have hundreds or thousands of connected, sleeping sessions.

Generally speaking, **when you’re doing activity monitoring, seeing sleeping sessions is a waste of time**. You need to see what’s actively happening on the server, not who has connected and left a session open anytime in recent days. So from the very first versions of Who is Active I simply filtered out anything that was sleeping, with one exception: Sleeping sessions may be holding an open transaction, in which case they may have resources locked.

Who is Active is called “Who is Active” because—by default—it only shows you information about sessions that are actually doing something. If you want to see all of the other sessions, it can do that too. But you’ll have to ask.

### Show Simple and Easily-Digestible Information

Remember session 54 from the background article? Here’s a reminder, via sp_who2:

![F1_01_sp_who2](image/F1_01_sp_who2.jpg)

It’s active (it’s doing something), so we’re interested. We see numerous rows because the granularity of these older tools is per-task, not per-request. We’ll get to tasks in a later post, but in the meantime consider this: **The same exact information has been reported numerous times**. There are not, in fact, numerous sessions using session ID 54, each of which are connected to ADAM03 and each of which are running some kind of SELECT. This is extraneous information that just makes our job of figuring out what’s going on that much more difficult. Even worse, all of those numbers (the CPU and DiskIO columns, in case you’re wondering) are each populated at the task level. If you needed to debug at the task level—and in practice, as an end-user you very, very rarely do—that would be great. But for most of us, a single, aggregated CPU time number works fine, thank you very much. (Assuming, of course, that these CPU numbers are even accurate.)

Here’s the same session, reported in Who is Active:

![F2_01_WIA](image/F2_01_WIA.jpg)

No matter how many tasks this session spins up, Who is Active will still return the exact same number of rows: 1. Part of the actual query, if it’s available, is shown right upfront. You can click on the XML if you want to see the full text. I’ve decided against showing the CPU and disk I/O columns in this screen shot because the values are both 0—it turns out that these numbers are quite often reported inaccurately for parallel requests, so the newer DMVs don’t show them in this case. Therefore, Who is Active doesn’t show them either.

### Impact the Server as Little as Possible; Return Data as Quickly as Possible

Looking for the cause of a performance problem shouldn’t exacerbate the problem. And **taking a peek at server activity shouldn’t cause a performance problem**.

The various Microsoft monitoring procedures mentioned in yesterday’s post follow this rule quite well-they run in virtually zero time and will never impact general server performance. Unfortunately, they also provide you with virtually no useful data with which to debug issues, so you might have been better off never looking to begin with. Profiler is the opposite: it can give you lots of data with which to debug, but can also cause the entire instance to grind to a halt.

For Who is Active I’ve tried to take the middle path: provide enough data to help debug complex issues, while still working extremely hard to avoid impacting the server. In order to accomplish this I’ve disabled automatic creating of statistics on all of the temp tables, employed dirty reads to avoid having the tool block or wait for a lock to be released, used hints to control memory allocations, and use cursors (not so evil after all!) in conjunction with error handling to process certain data in a more granular fashion.

The end result is pretty good. On most servers, in most situations, the default options return all of the data in under a second. And in the (hopefully rare) cases where the server is under so much stress that things are taking longer than they should, a couple of options can be disabled to make Who is Active collect less data. Speed is especially important to me. I'm not a patient person. **And when you’re debugging a tough issue, the last thing you should have to do is wait a long time to find out what’s going on**.

### Show as Much Data as Possible Without Going Overboard

**Who is Active collects data from 15 DMVs**. Each of these DMVs has many columns. That’s a huge number of potential data points that could be displayed. I’ve pruned down this set and have tried to include only those pieces of information that are actually valuable in the vast majority of cases. I don’t want the default Who is Active output to have so many columns that it’s difficult to read and understand. And I don’t want to have to process so much data that things slow down. For this same reason, a lot of the Who is Active features are not enabled by default. If you need a bit more data, it’s usually just a matter of figuring out which parameter to set.

### Provide a Flexible and Configurable Experience

You may want the results ordered by session ID descending. I may want them ordered by the amount of time an active request has been running, ascending. You may want to see different columns than I want to see on the left, or on the right. **We both win**. Thanks to some early feedback from Aaron Bertrand, I realized that **one size does not fit all when it comes to monitoring**, and I worked to make the Who is Active procedure as flexible as it can possibly be. The various output configuration features will be covered in detail in a post this later month.

### Safety and Security

Who is Active requires slightly elevated permissions **VIEW SERVER STATE** to do its job. And most of the people who run the stored procedure are system administrators with full access to everything on the system. This would be a non-issue if the procedure contained only a simple SELECT statement or two, but for both performance and display purposes I was forced to make heavy use of dynamic SQL. **I have taken every possible precaution to avoid making the procedure vulnerable to any kind of SQL injection attack**: All inputs are not only validated, but also never directly used. All object names encoded in dynamic SQL are safely quoted using QUOTENAME. And all other variables are parameterized. Later this month I’ll describe security in a bit more detail, along with a discussion on how to properly deploy and secure access to the stored procedure.

### Version Compatibility

One of my goals at the moment is to keep Who is Active compatible with all builds of SQL Server 2005 and SQL Server 2008. I haven’t done so well here; version 10.00 included a column that wasn’t available until SQL Server 2005 SP2, and many other versions have had similar issues. I have now built a case-sensitive SQL Server 2005 RTM instance in a virtual machine, and plan to test every Who is Active build in that environment going forward.

------
Prior: [A Brief History of Activity Monitoring](01_background.md) Next: [The License](03_license.md)
------
13 changes: 13 additions & 0 deletions Docs/03_license.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# sp_whoisactive: The License

------
[Home](https://github.com/amachanic/sp_whoisactive) [Download](https://github.com/amachanic/sp_whoisactive/archive/master.zip) [Documentation Index](ReadMe.md)
------
Prior: [Design Philosophy](02_design.md) Next: [Installing sp_whoisactive](04_installation.md)
------

sp_whoisactive uses GPLv3. [You can find the license here](https://github.com/amachanic/sp_whoisactive/blob/master/LICENSE).

------
Prior: [Design Philosophy](02_design.md) Next: [Installing sp_whoisactive](04_installation.md)
------
Loading