-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Telemetry & KPI's for beta, to be defined #49832
Comments
Pinging @elastic/kibana-stack-services (Team:Stack Services) |
I'm not familiar at all with how apps feed telemetry data now, but it appears that some folks collect stats along the way, and then dump at regular intervals or are polled by telemetry for the data. One alternative to collecting internal stats, would be to use the event log to query for them. There are likely reasons why this doesn't make sense, but I'm going to pretend like it's something we'd like to be able to do, at some point. So, will be interested to see what we'll be feeding telemetry, to see if we can actually get that back out of the event log. |
Our telemetry today is fairly high level and for the sake of time, I think we can avoid Goals
For our first release, I'm not sure how much effort we should put into having more granular telemetry given usage within the apps themselves will be limited. These are some basic metrics that come to mind and are all up for debate. I thought they'd help kick off a discussion. If we need to trim these down or add better ones, we can.
Nice to have metrics
@peterschretlen what do you think? |
@alexfrancoeur that's a comprehensive list! I think the metrics are good (assuming we can instrument and collect them) In addition I think it would be important to segment the metrics by spaces and solutions. Certain solutions might generate a lot of alerts automatically ( like SIEM for example ), which might skew the numbers. Quantity may not equate to usage, I think it will depend on the app. And for spaces, since alerts are segmented by space it would be good to know that this is being used (for example alerts for an app appearing in multiple spaces could suggest the isolation of spaces is being put to use, especially if there are different types or quantities of alerts ) I agree with @pmuellr the activity log would be a good place to get some of this information. In fact, from an Admin's perspective a lot of these metrics would be very useful to show in the management view and we might want to expose some or all of this in a |
some notes: Actions overall -> Total count - I think this would be the number of actions created (# of action saved objects) Actions overall -> Total count active (in use) - number of actions created that are actually used in an alert (or somewhere else, but currently just alerts AFAIK). Alerts overall -> Total count - would be like the actions one - number of alerts created (# of alert saved objects) Alerts overall -> Total count active (in use) - number of alerts created that are not disabled We probably want to track execution failures as well as successes - assume Total executions is the successes + failures, then add a new metric for Total execution failures or such. For both Actions and Alerts. Not clear if we really need the “overall” stats for Alerts/Actions, since that’s just denormalized sums of the “by type” ones. Can we just get those values for free somehow, wherever we make these stats available? Though it will be simple to calculate the “overall” stats, presumably, given the “by type” ones, so not a big deal. It feels like Alert Instances should actually be under Alerts overall and Alerts by type instead of by itself. I think a total count here is probably a good start on those. I suspect we can add all sorts of stats for Tasks, eh @gidi Morris??? |
Did a little digging on where alertInstances might be available, but not much luck. I was thinking they were probably persisted, but am not seeing them in the alert or task SO's. They might not be persisted. If not, or even if they are, you can see what alertInstances are "in use" by looking at the kibana/x-pack/legacy/plugins/alerting/server/task_runner/task_runner.ts Lines 134 to 144 in 89e4daf
That object is just a @mikecote may have a better answer |
@pmuellr you tagged some random Gidi :) Regarding Tasks, I'm not sure what we'd want to track as we could easily end up with lots of data that doesn't tell us much. I don't know much about how we're using and visualising telemetry data, but can we differentiate between Task stats in systems that are in heavy use vs. light use? many alerts vs. none? Large clusters vs random single node installations? Regarding @alexfrancoeur list:
Total Count doesn't include completed Tasks, as we don't keep Task history. It would include all scheduled tasks (one time and interval) and failed tasks (as we don't clean these out). |
They are persisted within the alert's task saved object. Within the The updated state object gets built / created starting here: kibana/x-pack/legacy/plugins/alerting/server/task_runner/task_runner.ts Lines 194 to 197 in ea9a7b8
|
cc @alexfrancoeur
The text was updated successfully, but these errors were encountered: