-
Notifications
You must be signed in to change notification settings - Fork 898
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Events don't show on Middleware Provider Timeline #15756
Comments
The comment above is a "summary" of what we saw on the test run. |
@miq-bot add_label providers/hawkular |
@miq-bot add_label events |
@abonas on it. |
@cfcosta |
I would be surprised if there was an actual issue in the event catcher, it has not changed in a while. I also doubt a problem in the timeline, which also has not changed to my knowledge. If you have confirmed that the events are generated in hAlerts, and tagged properly (they looked fine when we looked at this earlier this week) then the problem lies in the physical fetch from hAlerts. Either the catcher is not running or there is a problem like the one you posted above, which does not look good. That issue is due to many concurrent requests to Cassandra, which seems odd, because I don't see how our query would cause that problem. It seems almost like there is something else hitting cassandra and causing load. Can you examine the Cassandra load in some way? By peeking into the postgres db you could look at the ems_events table and see whether the events are there, but it seems more like the fetch is failing. If you have a way to look into the DB you could eliminate the timeline as the issue. |
@cfcosta @jshaughn @gbaufake
So, right now I don't know if this is an issue. I think we need more background on what should be doing the event catcher with the raised events. But right now, it would be accumulating them in miq the database completly hidden to the user and doing nothing else with those events 👎 |
@israel-hdez oh, perfect. Of the points you found, I found the first one only, so it's really good that you did manage to find other problems. I agree on what you said, after taking a look at the code. @abonas what should we do after this then? It seems like it is broken, but it also seems like the breakage was on purpose. No sure how to proceed on this. |
@jshaughn @lucasponce could you perhaps shed some light into the above ? perhaps you have a bit more background on @israel-hdez findings? |
@jshaughn can confirm when he is back on PTO but I remember that this was on purpose for several reasons:
But I hope Jay can confirm this. Also, related the background exception related BusyPoolException there is a jira https://issues.jboss.org/browse/HWKALERTS-275 where is being fixed. Another idea, to really confirm if this feature is broken, I would define a MiQ Alert and indicate that we want to "raise a MiQ Event" and "show in timeline", at least that was the proper way to show MiQ Alerts on timeline which I think it's referring the original description. |
Also, another side thought. From MiQ the Hawkular Alerting definitions are defined in a simple way. This comment is not really related with the issue but linked with the concern about number of events pulled and stored in miq from hawkular, this can be a way to configure that from the backend. |
Although https://issues.jboss.org/browse/HWKALERTS-275 and this issue were found Considering https://issues.jboss.org/browse/HWKALERTS-275 was corrected. It should not be a problem for MIQ. The test case used to find the present issue:
Considering the @israel-hdez findings, I think it needs more investigation on MIQ side to check if the events are stored on MIQ database or if there is some kind of filter which are preventing to show on timeline. |
Did you check in the MiQ form the options to "raise" or "shown events" in timeline ? |
Also, take into consideration that step 9 and 10 are not direct and MiQ needs to process and filter Hawkular Events according to MiQ Alert definitions. |
@lucasponce I updated some pictures on my last comment. |
@gbaufake I think you need to check also the "Send a Management Event" to see them in the "Management Event" filter. Can you check in the miq tables if there is some miq_alert generated ? |
@lucasponce I tried checking the boxes "Show on Timeline" and "Send a Management Event". The last one asks for a name. But the only thing it does is to log events in |
I've talked with @israel-hdez about this and, aside from the issue introduced recently due to the use of ::Hawkular::Alerts::Alert, the event catcher should be fine. It is important to understand that events tagged with miq.event_type: 'hawkular_alert' should not be shown on the timeline but are part of the internal implementation of "live alerting" in our provider. They are fetched by the event catcher but are then used to create actual MIQ Alerts. It is the MIQ alert that should be shown on the timeline. If I recall correctly, these are not shown on the 'Application' event group but I could be wrong. I believe the show up under a different event group filter. Edgar is now looking to see if something broke in the code that converts hawkular events into MIQ alerts. |
It's broken here: manageiq/app/models/miq_alert.rb Line 569 in 53c1704
because It's also broken here: manageiq/app/models/middleware_server.rb Line 36 in 53c1704
because alert id's can now be different. Aaahh! 😱 It's broken in a lot of places. |
@miq-bot assign israel-hdez |
@gbaufake now that everything is merged, you may want to test when the next build is available. |
After a session with @israel-hdez, we verified the corrections and it is working under d32b663! |
Description
Hello,
I've been trying to see Middleware Providers Events on Timeline and I couldn't see any. Although everything seems to be fine on Hawkular Services side. Group triggers, Group member, Events are being generated fine and connection between the MIQ and Provider seems to be fine as well.
Environment
docker:latest
hawkular-services:latest
Samples of Logs
[ { "eventType": "EVENT", "tenantId": "hawkular", "id": "MiQ-region-2ecdd959-4b31-45c1-bc5a-afc9ca8e9fcf-ems-1d05c128-af7f-409d-8855-4ec373930bc6-alert-28-1-1502156325254-4c581377-1c55-4474-8f72-4d1f31f56d1f", "ctime": 1502156325254, "dataSource": "_none_", "dataId": "MiQ-region-2ecdd959-4b31-45c1-bc5a-afc9ca8e9fcf-ems-1d05c128-af7f-409d-8855-4ec373930bc6-alert-28-1", "category": "TRIGGER", "text": "Test-Alert1-Instance1", "context": { "dataId.hm.prefix": "hm_g_", "dataId.hm.type": "gauge", "miq.alert_profiles": "22", "resource_path": "/t;hawkular/f;7402c000-6df6-46ae-9e79-9b4f71aa0ce4/r;EAP7-Standalone~~" }, "tags": { "miq.event_type": "hawkular_alert", "miq.resource_type": "MiddlewareServer" }, "trigger": { "tenantId": "hawkular", "id": "MiQ-region-2ecdd959-4b31-45c1-bc5a-afc9ca8e9fcf-ems-1d05c128-af7f-409d-8855-4ec373930bc6-alert-28-1", "name": "Test-Alert1-Instance1 for EAP7-Standalone", "description": "Test-Alert1-Instance1", "type": "MEMBER", "eventType": "EVENT", "eventCategory": null, "eventText": null, "severity": "MEDIUM", "context": { "dataId.hm.prefix": "hm_g_", "dataId.hm.type": "gauge", "miq.alert_profiles": "22", "resource_path": "/t;hawkular/f;7402c000-6df6-46ae-9e79-9b4f71aa0ce4/r;EAP7-Standalone~~" }, "tags": { "miq.event_type": "hawkular_alert", "miq.resource_type": "MiddlewareServer" }, "autoDisable": false, "autoEnable": false, "autoResolve": false, "autoResolveAlerts": true, "autoResolveMatch": "ALL", "dataIdMap": { "WildFly Memory Metrics~Heap Max": "hm_g_MI~R~[7402c000-6df6-46ae-9e79-9b4f71aa0ce4/EAP7-Standalone~~]~MT~WildFly Memory Metrics~Heap Max", "WildFly Memory Metrics~Heap Used": "hm_g_MI~R~[7402c000-6df6-46ae-9e79-9b4f71aa0ce4/EAP7-Standalone~~]~MT~WildFly Memory Metrics~Heap Used" }, "memberOf": "MiQ-region-2ecdd959-4b31-45c1-bc5a-afc9ca8e9fcf-ems-1d05c128-af7f-409d-8855-4ec373930bc6-alert-28", "enabled": true, "firingMatch": "ANY", "source": "_none_" }, "dampening": { "tenantId": "hawkular", "triggerId": "MiQ-region-2ecdd959-4b31-45c1-bc5a-afc9ca8e9fcf-ems-1d05c128-af7f-409d-8855-4ec373930bc6-alert-28-1", "triggerMode": "FIRING", "type": "STRICT", "evalTrueSetting": 1, "evalTotalSetting": 1, "evalTimeSetting": 0, "dampeningId": "hawkular-MiQ-region-2ecdd959-4b31-45c1-bc5a-afc9ca8e9fcf-ems-1d05c128-af7f-409d-8855-4ec373930bc6-alert-28-1-FIRING" }, "evalSets": [ [ { "evalTimestamp": 1502156325254, "dataTimestamp": 1502156342001, "type": "COMPARE", "condition": { "tenantId": "hawkular", "triggerId": "MiQ-region-2ecdd959-4b31-45c1-bc5a-afc9ca8e9fcf-ems-1d05c128-af7f-409d-8855-4ec373930bc6-alert-28-1", "triggerMode": "FIRING", "type": "COMPARE", "conditionSetSize": 2, "conditionSetIndex": 1, "conditionId": "hawkular-MiQ-region-2ecdd959-4b31-45c1-bc5a-afc9ca8e9fcf-ems-1d05c128-af7f-409d-8855-4ec373930bc6-alert-28-1-FIRING-2-1", "dataId": "hm_g_MI~R~[7402c000-6df6-46ae-9e79-9b4f71aa0ce4/EAP7-Standalone~~]~MT~WildFly Memory Metrics~Heap Used", "operator": "GT", "data2Id": "hm_g_MI~R~[7402c000-6df6-46ae-9e79-9b4f71aa0ce4/EAP7-Standalone~~]~MT~WildFly Memory Metrics~Heap Max", "data2Multiplier": 0.2 }, "value1": 365691568, "value2": 1366294528 }, { "evalTimestamp": 1502156325254, "dataTimestamp": 1502156342001, "type": "COMPARE", "condition": { "tenantId": "hawkular", "triggerId": "MiQ-region-2ecdd959-4b31-45c1-bc5a-afc9ca8e9fcf-ems-1d05c128-af7f-409d-8855-4ec373930bc6-alert-28-1", "triggerMode": "FIRING", "type": "COMPARE", "conditionSetSize": 2, "conditionSetIndex": 2, "conditionId": "hawkular-MiQ-region-2ecdd959-4b31-45c1-bc5a-afc9ca8e9fcf-ems-1d05c128-af7f-409d-8855-4ec373930bc6-alert-28-1-FIRING-2-2", "dataId": "hm_g_MI~R~[7402c000-6df6-46ae-9e79-9b4f71aa0ce4/EAP7-Standalone~~]~MT~WildFly Memory Metrics~Heap Used", "operator": "LT", "data2Id": "hm_g_MI~R~[7402c000-6df6-46ae-9e79-9b4f71aa0ce4/EAP7-Standalone~~]~MT~WildFly Memory Metrics~Heap Max", "data2Multiplier": 0.15 }, "value1": 365691568, "value2": 1366294528 } ] ] } ]
MIQ Server Provider with Data:
MIQ Provider Timeline with no events:
The text was updated successfully, but these errors were encountered: