Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trigger alert for vol and brick failures #287

Closed

Conversation

nnDarshan
Copy link
Contributor

@nnDarshan nnDarshan commented Jun 6, 2017

This patch sends alerts when the status of volume/bricks changes. Both from healthy to unhealthy and vice versa

tendrl-bug-id: #286

nnDarshan added 2 commits June 5, 2017 17:21
This patch consolidates the gluster brick structure by
haveing all the brick details under single place in etcd
and linking it from other places where its needed.

tendrl-bug-id: Tendrl#278
Signed-off-by: nnDarshan <darshan.n.2024@gmail.com>
tendrl-bug-id: Tendrl#286
Signed-off-by: nnDarshan <darshan.n.2024@gmail.com>
@nnDarshan nnDarshan force-pushed the TriggerAlertForVolAndBrickFailures branch from c4284a7 to f428567 Compare June 6, 2017 12:41
@nnDarshan
Copy link
Contributor Author

@shtripat @r0h4n @nthomas-redhat @anmolbabu Please review.
Have tested by checking an event with priority "notice" appears in the "/messages/events" when status of volume or brick changes

alert = {}
alert['source'] = 'tendrl-gluster-integration'
alert['pid'] = os.getpid()
alert['timestamp'] = tendrl_now().isoformat()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be time_stamp as per latest testing with alerting module.

if curr_value == "Stopped":
severity = "critical"
alert['severity'] = severity
alert['resource'] = resource
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resource name should be something like volume_status and the same should be populated as handles field of a handler which needs to added under tendrl-alerting module as handlers/cluster/volume_status_handler.py

You can refer Tendrl/notifier#78 for more details.

If required a separate handler to be written for brick status changes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The resource should be the name of object as defined in Tendrl definitions i.e. Volume and the other part of resource should be the attribute on which the alert is triggered

Volume.status

Copy link
Contributor

@anmolbabu anmolbabu Jun 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@r0h4n The alerting module expects it as volume_status or volume_utilization so basically its the combination of <entity_type>_<alert_type> where alert_type is either status or utilization...
This avoids the alerting module having to know the object definition from the definitions of every integration module..

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The alerting module does need to know about the object definition and attributes. Tendrl will be generating alerts on specific attributes of a Tendrl object.

But if you are expecting a underscore between the entity/object and alert/attribute we can continue with below scheme

_

works?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes .. volume_status works with alerting as it stands today...

@@ -4,34 +4,60 @@
class Brick(objects.BaseObject):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are not landing the Gluster Brick changes #282
until the UI folks ack on that, So please remove changes related to that and only keep the alert triggers

@@ -1,11 +1,10 @@
from tendrl.commons.event import Event
from tendrl.commons.message import Message
from tendrl.commons import objects
from tendrl.gluster_integration.objects.gluster_brick import GlusterBrick
from tendrl.gluster_integration.objects.brick import Brick
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please dont mix up this PR and #282 , the latter one requires UI work completion and the current PR needs to be merged without PR 282

@@ -20,6 +23,37 @@ def __init__(self):
super(GlusterIntegrationSdsSyncStateThread, self).__init__()
self._complete = gevent.event.Event()

def _emit_event(self, resource, curr_value, msg):
alert = {}
alert['source'] = 'tendrl-gluster-integration'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use NS.publisher_id

if curr_value == "Stopped":
severity = "critical"
alert['severity'] = severity
alert['resource'] = resource
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The resource should be the name of object as defined in Tendrl definitions i.e. Volume and the other part of resource should be the attribute on which the alert is triggered

Volume.status

"notice",
"alerting",
{'message': json.dumps(alert)},
node_id=NS.node_context.node_id
Copy link
Contributor

@anmolbabu anmolbabu Jun 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This(node_id) is not required it will be automatically taken by Message class..

current_status
)
self._emit_event(
volumes['volume%s.name' % index],
Copy link
Contributor

@anmolbabu anmolbabu Jun 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for cases like brick where info about its volume and cluster both are important, have resource type as brick_status and have an additional parameter under tags called the plugin_instance and have its value in accordance with https://github.com/Tendrl/node-monitoring/blob/develop/tendrl/node_monitoring/plugins/tendrl_glusterfs_brick_utilization.py#L286

Now, the reason for not having additional fields like vol_name, brick_path extra under the dict tags is , that collectd does not allow us to have custom additional fields in tags as the tags attribute for collectd generated alerts come directly from collectd based on how the plugin is configured(an example is the above link) and only a few reserved fields can be played around with which leaves plugin_instance as the best attribute choice left...
So if we decide to have custom fields in our(tendrl generated) alerts unless absolutely necessary will render the alerting module validations and alert clearing logic less generic...

@nnDarshan
Copy link
Contributor Author

Am taking care of all comments in this patch: #288
So closing this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants