-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Trigger alert for vol and brick failures #287
Trigger alert for vol and brick failures #287
Conversation
This patch consolidates the gluster brick structure by haveing all the brick details under single place in etcd and linking it from other places where its needed. tendrl-bug-id: Tendrl#278 Signed-off-by: nnDarshan <darshan.n.2024@gmail.com>
tendrl-bug-id: Tendrl#286 Signed-off-by: nnDarshan <darshan.n.2024@gmail.com>
c4284a7
to
f428567
Compare
@shtripat @r0h4n @nthomas-redhat @anmolbabu Please review. |
alert = {} | ||
alert['source'] = 'tendrl-gluster-integration' | ||
alert['pid'] = os.getpid() | ||
alert['timestamp'] = tendrl_now().isoformat() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be time_stamp
as per latest testing with alerting module.
if curr_value == "Stopped": | ||
severity = "critical" | ||
alert['severity'] = severity | ||
alert['resource'] = resource |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resource name should be something like volume_status
and the same should be populated as handles
field of a handler which needs to added under tendrl-alerting
module as handlers/cluster/volume_status_handler.py
You can refer Tendrl/notifier#78 for more details.
If required a separate handler to be written for brick status changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The resource should be the name of object as defined in Tendrl definitions i.e. Volume and the other part of resource should be the attribute on which the alert is triggered
Volume.status
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@r0h4n The alerting module expects it as volume_status or volume_utilization so basically its the combination of <entity_type>_<alert_type> where alert_type is either status or utilization...
This avoids the alerting module having to know the object definition from the definitions of every integration module..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The alerting module does need to know about the object definition and attributes. Tendrl will be generating alerts on specific attributes of a Tendrl object.
But if you are expecting a underscore between the entity/object and alert/attribute we can continue with below scheme
_
works?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes .. volume_status works with alerting as it stands today...
@@ -4,34 +4,60 @@ | |||
class Brick(objects.BaseObject): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are not landing the Gluster Brick changes #282
until the UI folks ack on that, So please remove changes related to that and only keep the alert triggers
@@ -1,11 +1,10 @@ | |||
from tendrl.commons.event import Event | |||
from tendrl.commons.message import Message | |||
from tendrl.commons import objects | |||
from tendrl.gluster_integration.objects.gluster_brick import GlusterBrick | |||
from tendrl.gluster_integration.objects.brick import Brick |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please dont mix up this PR and #282 , the latter one requires UI work completion and the current PR needs to be merged without PR 282
@@ -20,6 +23,37 @@ def __init__(self): | |||
super(GlusterIntegrationSdsSyncStateThread, self).__init__() | |||
self._complete = gevent.event.Event() | |||
|
|||
def _emit_event(self, resource, curr_value, msg): | |||
alert = {} | |||
alert['source'] = 'tendrl-gluster-integration' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use NS.publisher_id
if curr_value == "Stopped": | ||
severity = "critical" | ||
alert['severity'] = severity | ||
alert['resource'] = resource |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The resource should be the name of object as defined in Tendrl definitions i.e. Volume and the other part of resource should be the attribute on which the alert is triggered
Volume.status
"notice", | ||
"alerting", | ||
{'message': json.dumps(alert)}, | ||
node_id=NS.node_context.node_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This(node_id) is not required it will be automatically taken by Message class..
current_status | ||
) | ||
self._emit_event( | ||
volumes['volume%s.name' % index], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for cases like brick where info about its volume and cluster both are important, have resource type as brick_status and have an additional parameter under tags called the plugin_instance and have its value in accordance with https://github.com/Tendrl/node-monitoring/blob/develop/tendrl/node_monitoring/plugins/tendrl_glusterfs_brick_utilization.py#L286
Now, the reason for not having additional fields like vol_name, brick_path extra under the dict tags is , that collectd does not allow us to have custom additional fields in tags as the tags attribute for collectd generated alerts come directly from collectd based on how the plugin is configured(an example is the above link) and only a few reserved fields can be played around with which leaves plugin_instance as the best attribute choice left...
So if we decide to have custom fields in our(tendrl generated) alerts unless absolutely necessary will render the alerting module validations and alert clearing logic less generic...
Am taking care of all comments in this patch: #288 |
This patch sends alerts when the status of volume/bricks changes. Both from healthy to unhealthy and vice versa
tendrl-bug-id: #286