Alerting in KAT #65

underdarknl · 2022-11-10T09:39:59Z

underdarknl
Nov 10, 2022
Collaborator

As KAT acquires knowledge about your systems and relates that information to your applied business rules it might find a need to alert you, your users, or even your suppliers about issues.
To make this happen, we have designed the following solution based on a set of requirements.

Alerting requirements
• KAT needs to alert based on a set of rules. Eg, only alert when a certain threshold has been reached.
• KAT needs to keep track of why and how it alerted whom.
• KAT needs to be flexible.
• KAT needs to be able to alert various (groups of) people depending on what triggered the alert.

Alerts as business rules
KAT already has a business rule engine which more or less functions as a state-machine. This rule engine is triggered when the graph changes, (eg, an object Is added or removed), and can in turn create new objects. These rules are called ‘bits’
To minimize development time, and to make sure we do as much work as we can with minimal computing power, it seems logical also evaluate alerting rules in the same process that runs regular business rules.
These alerting rules take the form of an input requirement (eg, if A exists, for example a CVEFinding), and a set of actions (send an alert to Signal). In regular business rules there’s an intermediate step which evaluates the inputs using (for now Python) and can do fine grained and flexible decision making based on object properties or even combinations of objects. For the alerting rules it would be a good start to allow simpler rules first, and if needed add the same flexibility as is present for regular bits.

Storage of alert history
As all inputs to bits are objects in our graph, and since that graph retains the full history of each object, we can always deduce when and how we alerted by running the ‘bits’ and ‘alert-bits’ again. However, this might not be enough, as it does not store the actual time stamped messages that we send out.
To solve this we can store the messages in Bytes, our forensic data store, which allows us to sign the hash of each message using an externally trusted party. Once in Bytes we could opt to add the alerts into the graph again, making their existence visible in Rocky (our user interface)

Where to send alerts
Knowing who to send an alert is as important is knowing when to alert. The latter can be solved by using processing in the alert-bits, where they keep an eye on the graph and produce output if needed. The “who” is a more complex question, which might need to be as flexible as the bits themselves. The current strategy is to reuse the same technique that we use for declared and inherited indemnifications. This technique relies on rules in the OOI model that allow a declared indemnification to be inherited into related objects, Doing so adheres to specific directionality rules, and can also decrease or maximize the inherited declaration level based on the relation of the objects.

Applying this to the use case of alerting the idea is to bind ownership to objects in the form of people or groups of people. These ownership-claims can then be inherited in a similar fashion over the graph to adjacent objects. By introducing roles (such as product owner, supplier, or engineer) we can use different rules which allow us to use different directionality rules for each.
Ownership claims can then reach an object from various different objects, each for their own roles. Depending on the type of alert we could then select the right role, or roles to select which people or groups we want to send a message.

reinoud · 2022-11-10T11:53:39Z

reinoud
Nov 10, 2022

The question is whether this should be done by Openkat.

Most IT organisations have a process in place for monitoring and alerting. Where to send alerts at what time based on priority is already managed.
I think it is not needed to rebuild such an infrastructure, and not needed to manage that process twice.

A working alternative would be implementing a Prometheus-exporter. This can be monitored by the Prometheus monitoring system (which is the monitoring system to go for most organisations with a standard devops-stack). From there, the alerting manager will take care how to route alerts (mail, chat, pager, app, webhook, ticket), using the existing schedules that are already in place.

Building an exporter is should be easy: defining a naming scheme based on the business rule that detects it, and generating a simple text webpage in Prom format should not take much time

In my opinion we should be cautious of feature-creep in Openkat and not rebuild functionality that is already present, and battle-tested

7 replies

reinoud Nov 12, 2022

but is it hard for organisations to roll out a standard messaging stack that already has a lot of integrations with about everything? Where do we want to draw the boundaries of OpenKat?

underdarknl Nov 14, 2022
Collaborator Author

KAT is in a unique (and I think more powerful) position to do this alerting in a better and more useful way.
The proposition is to collect in KAT who's responsible for various objects in the Graph, and use the graph's structure to fill in the intermediate steps for all related assets. Doing so outside KAT would either see the need for that outside tool to also traverse a form of graph to solve these questions, or would leave this burden with the people responsible for keeping their CMDB up to date. As we all know this is a task already low on most peoples priority and wish list, and its almost always ends at the assets controlled by suppliers.

Next to this, most other tools do not actually hash or timestamp any steps they take, and therefor cannot be relied on as evidence.

Sending out the actual message does not need to be part of KAT (but could be in the form of a Signal integration), we just need to able to send your alerting tool of choice the generated message, and store its return values for safekeeping.

reinoud Nov 14, 2022

In that case we need to write integrations for:

signal
slack
teams
hipchat
email
webhooks
alertmanager
and probably a lot more...

to keep OpenKat usable in more environments

underdarknl Nov 29, 2022
Collaborator Author

agreed :)
Signal is already available, same as nta7516 email.

dmeulen Dec 9, 2022
Collaborator

Another option would be the GitLab way of bundling software.

Make the standard tooling part of the OpenKAT distribution, incorporate the prometheus and alartmanager containers in OpenKAT that take care of sending out notifications and metrics. This way all the proposed functionalities are bundled with OpenKAT with minimal development and maintenance effort.

This gives OpenKAT users the option to replace the bundled solution with an already existing monitoring platform.

I would ditch the existing signal and email integration and bundle the mentioned containers and use the existing alertmanager integrations for signal and email.

ehotting · 2022-11-10T13:27:26Z

ehotting
Nov 10, 2022

Risking even more scope creep, I know the Dutch government puts some effort in standardising notifications, based on cloudevents.io. A draft NL Gov profile can be found here: https://vng-realisatie.github.io/NL-GOV-profile-for-CloudEvents/

Cloud Events is versatile, it separates protocol and content.

SDK for Python to be found here: https://github.com/cloudevents/sdk-python

Perhaps a quick look to check if it is worth the effort to join in on that initiative?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Alerting in KAT #65

{{title}}

Replies: 2 comments 7 replies

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Alerting in KAT #65

underdarknl Nov 10, 2022 Collaborator

Replies: 2 comments · 7 replies

reinoud Nov 10, 2022

reinoud Nov 12, 2022

underdarknl Nov 14, 2022 Collaborator Author

reinoud Nov 14, 2022

underdarknl Nov 29, 2022 Collaborator Author

dmeulen Dec 9, 2022 Collaborator

ehotting Nov 10, 2022

underdarknl
Nov 10, 2022
Collaborator

Replies: 2 comments 7 replies

reinoud
Nov 10, 2022

underdarknl Nov 14, 2022
Collaborator Author

underdarknl Nov 29, 2022
Collaborator Author

dmeulen Dec 9, 2022
Collaborator

ehotting
Nov 10, 2022