Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Alerting] allow a human-readable string to be associated with an instanceId #64268

Closed
pmuellr opened this issue Apr 22, 2020 · 8 comments
Closed
Assignees
Labels
Feature:Alerting Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams)

Comments

@pmuellr
Copy link
Member

pmuellr commented Apr 22, 2020

Currently the alerting framework uses strings as instanceIds, that an alert executor will use when scheduling action groups for execution. These instanceIds are then displayed in the alert details page, to show the state of the instances the executor has processed.

Sometimes, that works out nicely, if your instanceIds are human-readable:

image

But if they aren't, say they're UUIDs, it's a dog's breakfast:

image

We should figure out a way that an executor could supply a name when referencing an instanceId. And then make that available wherever instance information is available. That name should be the string displayed in the UI. If the name isn't supplied, then the instanceId would be displayed, as it is today.

@pmuellr pmuellr added Feature:Alerting Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) labels Apr 22, 2020
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-alerting-services (Team:Alerting Services)

@pmuellr
Copy link
Member Author

pmuellr commented Apr 22, 2020

My first guess at an implementation would be to change the instance factory to take an optional name as an additional new parameter, in addition to id:

export function createAlertInstanceFactory(alertInstances: Record<string, AlertInstance>) {
return (id: string): AlertInstance => {
if (!alertInstances[id]) {
alertInstances[id] = new AlertInstance();
}
return alertInstances[id];
};
}

If we think we might need more things later, we could make the second param an optional option "bag" with a name property instead.

Presumably this would be stored as a new instance field in the AlertIntance:

export const rawAlertInstance = t.partial({
state: stateSchema,
meta: metaSchema,
});
export type RawAlertInstance = t.TypeOf<typeof rawAlertInstance>;

Not sure what else that will affect - saved object shapes? Certainly some external type used in the endpoint the UI calls to populate the alert details page.

@pmuellr
Copy link
Member Author

pmuellr commented Jul 6, 2020

Starting to look into this issue, here's a survey of usages of instance ids in the product:


for (const groupResult of groupResults) {
const instanceId = groupResult.group;
const value = groupResult.metrics[0][1];
const met = compareFn(value, params.threshold);
if (!met) continue;
const baseContext: BaseActionContext = {
date,
group: instanceId,
value,
};
const actionContext = addMessages(options, baseContext, params);
const alertInstance = options.services.alertInstanceFactory(instanceId);
alertInstance.scheduleActions(ActionGroupId, actionContext);
logger.debug(`scheduled actionGroup: ${JSON.stringify(actionContext)}`);
}
}

The instanceId is the value used for grouping,if grouping was requested, otherwise is the string 'all documents'.

Eg, for a query where you group based on a field used to store host names, the instance id will be a host name.


const alertInstance = services.alertInstanceFactory(
AlertType.ErrorRate
);

instanceId: apm.error_rate


const alertInstance = services.alertInstanceFactory(
AlertType.TransactionDuration
);

instanceId: apm.transaction_duration


const alertInstance = services.alertInstanceFactory(`${item}::${alertId}`);

Not clear what item will end up resolving to. The ::${alertId} suffix isn't needed.


const alertInstance = alertInstanceFactory(`${alertId}-${UNGROUPED_FACTORY_KEY}`);

const alertInstance = alertInstanceFactory(`${alertId}-${group.name}`);

Not clear what group.name will end up resolving to. UNGROUPED_FACTORY_KEY is *. The ${alertId}- prefix isn't needed.


const alertInstance = services.alertInstanceFactory(`${group}::${alertId}`);

Not clear what group will end up resolving to. The ::${alertId} suffix isn't needed.


const instance = services.alertInstanceFactory(ALERT_TYPE_CLUSTER_STATE);

instanceId: monitoring_alert_type_cluster_state


const instance = services.alertInstanceFactory(ALERT_TYPE_LICENSE_EXPIRATION);

instanceId: monitoring_alert_type_license_expiration


It appears this is the id of the alert, so a different fixed string should probably be used.


const alertInstance = services.alertInstanceFactory(alertId);

It appears this is the id of the alert, so a different fixed string should probably be used.


const alertInstance = options.services.alertInstanceFactory(MONITOR_STATUS.id);

instanceId: xpack.uptime.alerts.actionGroups.monitorStatus


const alertInstance = alertInstanceFactory(TLS.id);

instanceId: xpack.uptime.alerts.actionGroups.tls

pmuellr added a commit to pmuellr/kibana that referenced this issue Jul 7, 2020
resolves elastic#64268

Instances can now be retrieved/created via

    alertInstanceFactory('instance-id', { name: 'a nice name for this instance' });

Alert instances can use `getName()` to return the name, if set.

Adds new context variables for all alerts with names `alertInstanceName` and
`alertInstanceNameOrId`.  The former may be `undefined`.  The latter is
`alertInstance.name || alertInstanceId`.

----

However, from existing usage as documented in that issue, I don't think we
need this quite **yet**. I think a lot of the usages of alertIds in those
instance ids aren't required or desired.
@pmuellr
Copy link
Member Author

pmuellr commented Jul 7, 2020

It seems like many of these instance id strings should be changed to something more useful, however, there IS a "migration" issue. Say you changed the instance id string in 7.9, any alerts from 7.8 will switch from using the old instance id to the new one, once Kibana is running. This means:

  • the alert will consider this a new instance
  • you'll be starting with fresh instance state, since this instance id didn't exist before - the old instance state will be available for that first run in 7.9, then after that it'll be gone
  • if you muted that instance in 7.8, it will no longer be muted in 7.9

Some downsides, for sure, and I guess we need to factor these "instance id migrations" into our dev docs somehow. It's probably worth changing though, and it's possible you could recover the old state from 7.8 if you want (just access an instance with the old id). Muting will be hard, as the alert executor does not yet have access to the alert client (until I convince @mikecote of this! :-) ).

@mikecote
Copy link
Contributor

mikecote commented Jul 9, 2020

I'm seeing this as another case that would seem easier if ever the alert state lived within the alert object. Saved object migrations would handle this no problem and we could allow alert types to migrate their ids this with #50216. We would also stop losing state whenever changing the enabled status of an alert. I'm sure there would be further complications with an approach like this, with the event log for example.

@Zacqary
Copy link
Contributor

Zacqary commented Jul 9, 2020

#71335 and #71340 will resolve the ${alertId} issues mentioned above and make the alert instances for Logs and Metrics much more human-readable (at least to the degree that something like a host.hostname is human-readable, sometimes they're not). But for other alert types I definitely think we could use this functionality.

@pmuellr
Copy link
Member Author

pmuellr commented Aug 14, 2020

I made another pass through the callers of alertInstanceFactory(instanceId) to look for obvious cases of "id"s being used as instance ids. It seems like the alert type implementations are getting more elaborate, it's harder to track the parameter used in that call to see what the source might be.

The only obvious ones are with security solution, here:

if (signalsCount !== 0) {
const alertInstance = services.alertInstanceFactory(alertId);
scheduleNotificationActions({ alertInstance, signalsCount, resultsLink, ruleParams });
}

if (result.createdSignalsCount) {
const alertInstance = services.alertInstanceFactory(alertId);
scheduleNotificationActions({
alertInstance,
signalsCount: result.createdSignalsCount,
resultsLink,
ruleParams: notificationRuleParams,
});
}

I'll contact the team off-line, in case they want to fix this - but I suspect they may not, since their usage of alerts is more of an implementation detail than exposing customer-facing alerts.

@pmuellr
Copy link
Member Author

pmuellr commented Aug 14, 2020

I'm going to close this for now, as I've been treating this issue as a kind of meta issue to see if we can get human readable instance ids into the system directly for the current set of alertTypes in Kibana, instead of having both an instance id and name.

We may want to revisit having an explicit name, as mentioned in #64268 (comment), but I think it's probably best to wait for some explicit solution or customer feedback to get more requirements.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature:Alerting Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams)
Projects
None yet
Development

No branches or pull requests

5 participants