Alerting in KAT #65
Replies: 2 comments 7 replies
-
The question is whether this should be done by Openkat. Most IT organisations have a process in place for monitoring and alerting. Where to send alerts at what time based on priority is already managed. A working alternative would be implementing a Prometheus-exporter. This can be monitored by the Prometheus monitoring system (which is the monitoring system to go for most organisations with a standard devops-stack). From there, the alerting manager will take care how to route alerts (mail, chat, pager, app, webhook, ticket), using the existing schedules that are already in place. Building an exporter is should be easy: defining a naming scheme based on the business rule that detects it, and generating a simple text webpage in Prom format should not take much time In my opinion we should be cautious of feature-creep in Openkat and not rebuild functionality that is already present, and battle-tested |
Beta Was this translation helpful? Give feedback.
-
Risking even more scope creep, I know the Dutch government puts some effort in standardising notifications, based on cloudevents.io. A draft NL Gov profile can be found here: https://vng-realisatie.github.io/NL-GOV-profile-for-CloudEvents/ Cloud Events is versatile, it separates protocol and content. SDK for Python to be found here: https://github.com/cloudevents/sdk-python Perhaps a quick look to check if it is worth the effort to join in on that initiative? |
Beta Was this translation helpful? Give feedback.
-
As KAT acquires knowledge about your systems and relates that information to your applied business rules it might find a need to alert you, your users, or even your suppliers about issues.
To make this happen, we have designed the following solution based on a set of requirements.
Alerting requirements
• KAT needs to alert based on a set of rules. Eg, only alert when a certain threshold has been reached.
• KAT needs to keep track of why and how it alerted whom.
• KAT needs to be flexible.
• KAT needs to be able to alert various (groups of) people depending on what triggered the alert.
Alerts as business rules
KAT already has a business rule engine which more or less functions as a state-machine. This rule engine is triggered when the graph changes, (eg, an object Is added or removed), and can in turn create new objects. These rules are called ‘bits’
To minimize development time, and to make sure we do as much work as we can with minimal computing power, it seems logical also evaluate alerting rules in the same process that runs regular business rules.
These alerting rules take the form of an input requirement (eg, if A exists, for example a CVEFinding), and a set of actions (send an alert to Signal). In regular business rules there’s an intermediate step which evaluates the inputs using (for now Python) and can do fine grained and flexible decision making based on object properties or even combinations of objects. For the alerting rules it would be a good start to allow simpler rules first, and if needed add the same flexibility as is present for regular bits.
Storage of alert history
As all inputs to bits are objects in our graph, and since that graph retains the full history of each object, we can always deduce when and how we alerted by running the ‘bits’ and ‘alert-bits’ again. However, this might not be enough, as it does not store the actual time stamped messages that we send out.
To solve this we can store the messages in Bytes, our forensic data store, which allows us to sign the hash of each message using an externally trusted party. Once in Bytes we could opt to add the alerts into the graph again, making their existence visible in Rocky (our user interface)
Where to send alerts
Knowing who to send an alert is as important is knowing when to alert. The latter can be solved by using processing in the alert-bits, where they keep an eye on the graph and produce output if needed. The “who” is a more complex question, which might need to be as flexible as the bits themselves. The current strategy is to reuse the same technique that we use for declared and inherited indemnifications. This technique relies on rules in the OOI model that allow a declared indemnification to be inherited into related objects, Doing so adheres to specific directionality rules, and can also decrease or maximize the inherited declaration level based on the relation of the objects.
Applying this to the use case of alerting the idea is to bind ownership to objects in the form of people or groups of people. These ownership-claims can then be inherited in a similar fashion over the graph to adjacent objects. By introducing roles (such as product owner, supplier, or engineer) we can use different rules which allow us to use different directionality rules for each.
Ownership claims can then reach an object from various different objects, each for their own roles. Depending on the type of alert we could then select the right role, or roles to select which people or groups we want to send a message.
Beta Was this translation helpful? Give feedback.
All reactions