Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PROPOSAL] Per-provider Event Timelines With Regex Support #17295

Closed
miha-plesko opened this issue Apr 13, 2018 · 12 comments
Closed

[PROPOSAL] Per-provider Event Timelines With Regex Support #17295

miha-plesko opened this issue Apr 13, 2018 · 12 comments

Comments

@miha-plesko
Copy link
Contributor

miha-plesko commented Apr 13, 2018

So I'm playing with Timelines feature for one of the providers (Nuage). Talking about this view here:

capture

capture

And there are three problems I see:

  • Timeline shows events of all providers, not just Nuage provider.
  • Event timeline categories are mixed across providers.
  • Each event type must be manually mapped to category, no regex supported

Timeline Shows Events of All Providers, Not Just Nuage

Looking at source code I can see such where_clause:

(timestamp >= ? and timestamp <= ?) and (event_type in (?))

This means it would show all events in the Timeline, no matter the source ems.

Event Timeline Categories are Mixed Across Providers

Each provider is allowed to provide event-category mapping. Amazon for example specifies following (source code):

:ems:
  :ems_amazon:
    :event_handling:
      :event_groups:
        :addition:
          :critical:
          - AWS_EC2_Instance_CREATE
          - AWS_EC2_Instance_UPDATE
          - AWS_EC2_Instance_DELETE
          - AWS_API_CALL_TerminateInstances
          - AWS_API_CALL_RunInstances
        :power:
          :critical:
          - AWS_EC2_Instance_running
          - AWS_EC2_Instance_shutting-down
          - AWS_EC2_Instance_stopped
          - EC2_Instance_State_change_Notification_running
          - EC2_Instance_State_change_Notification_shutting_down
          - EC2_Instance_State_change_Notification_stopped
          - EC2_Instance_State_change_Notification_stopping
          - EC2_Instance_State_change_Notification_terminated
          - EC2_Instance_State_change_Notification_pending
          - AWS_API_CALL_StopInstances
          - AWS_API_CALL_StartInstances

What I'd expect is that Amazon would then categorize events based on this settings. BUT looking at source code I can see that in fact Amazon settings get merged with core settings and also with settings from all other providers. This means that what I see in each event category of Timeline is in fact a joint filter composed from all settings.yaml from all providers, which I think is wrong.

Suppose Amazon wanted to categorize Instance_CREATE event in :addition category but OpenStack in :other category. It would result in this event type appearing in both categories.

Each Event Type Must Be Manually Mapped to Category, No Regex Supported

Each provider can provide a mapping of event types to categories in its own config/settings.yaml. But it has to map each event explicitly, there is no support for regular expressions. For Nuage provider we have about 1k event types and listing them all sounds odd.

However, looking at source code I can see that backend then actually performs

and (event_type in (?))

statement which can not be easily converted to accept regex.

/cc @Fryguy , @gberginc , @pdellaert

@miha-plesko
Copy link
Contributor Author

Proposal

Timeline Shows Events of All Providers, Not Just Nuage

EmsEvent object has ems_is attribute. We could show only those events in Timeline that belong to EMS that user clicked Monitoring -> Timelines for.

Event Timeline Categories are Mixed Across Providers

What if we don't filter out events based only on event_type, but rather based on both event_type and source (e.g. NUAGE)? Or we can do it based on ems_id, but it won't work for MIQ events that have empty ems_id.

Each Event Type Must Be Manually Mapped to Category, No Regex Supported

We could simply allow SQL LIKE kind of event types in the mapping yaml and we could then append each of them to the WHERE clause like this:

WHERE ... OR event_type LIKE (?)
               OR event_type LIKE (?)
              ...

(Not fan of manually building SQL statement, but whatever)

WDYT?

@Fryguy
Copy link
Member

Fryguy commented Apr 13, 2018

Timeline Shows Events of All Providers, Not Just Nuage
EmsEvent object has ems_is attribute. We could show only those events in Timeline that belong to EMS that user clicked Monitoring -> Timelines for.

I agree that if you click Monitoring -> Timelines for a particular EMS, you'd only want to see that EMS. However, we would have to verify that there are no timeline presentation screen that do cross-provider views. If you go to the "all emses" page, can you check multiple providers and go to timeline, for example? Might need someone from UI to help verify.

Event Timeline Categories are Mixed Across Providers
What if we don't filter out events based only on event_type, but rather based on both event_type and source (e.g. NUAGE)? Or we can do it based on ems_id, but it won't work for MIQ events that have empty ems_id.

source is probably not good enough because we also create synthetic events for some providers, where the source is EVM, for example. Also, there's no strict mapping between provider and source, so all providers would have to have have some individual way to detect that. I've always disliked this column, btw, and have wanted to get rid of it, instead opting for a system column or synthetic column or something.

What we may be able to do is change the event_groups such that we get rid of the merging altogether, and then instead of hash lookups, have a method that does dynamic lookups based on the provider. Part of the reason for the caching was that in the past, this stuff was loaded from disk. Now that it's in settings it's always available in memory, so doing lookups dynamically is not an issue.

Each Event Type Must Be Manually Mapped to Category, No Regex Supported
We could simply allow SQL LIKE kind of event types in the mapping yaml and we could then append each of them to the WHERE clause like this:

WHERE ... OR event_type LIKE (?)
OR event_type LIKE (?)
...
(Not fan of manually building SQL statement, but whatever)

you can use chaining where clauses with SQL fragments:

[9] pry(main)> EmsEvent.where("event_type LIKE ?", "%foo%").or(EmsEvent.where("event_type LIKE ?", "%bar%"))
  EmsEvent Load (1.5ms)  SELECT "event_streams".* FROM "event_streams" WHERE ("event_streams"."type" IN ('EmsEvent') AND (event_type LIKE '%foo%') OR "event_streams"."type" IN ('EmsEvent') AND (event_type LIKE '%bar%'))

using where clauses in this way will also prevent SQL injection, which is important if we are going to let the user define parts of a query.

WDYT?

This all sounds good to me!

@Fryguy
Copy link
Member

Fryguy commented Apr 13, 2018

cc @agrare @blomquisg on your thoughts here.

@gberginc
Copy link
Contributor

Re SQL chaining, I wonder if there are any reasons not to user PG regexps? It looks to me that POSIX regular expressions have been around at least since PG 9.0.

Thus, Instead of chaining wheres, one could build a regexp, e.g.

EmsEvent.where('event_type ~ ?', '.*update|enterprise_create|vrs.*')

Of course, this depends on whether we'd like to support POSIX regular expressions in settings, i.e. for the above example, admin would have specify this as

:ems:
  :ems_nuage_networking:
    :event_handling:
      :event_groups:
        :addition:
          :critical:
          - .*update
          - enterprise_create
          - vrs.*

@miha-plesko
Copy link
Contributor Author

Updating code not to build SQL statement manually is then the 4th problem with timelines that we could address. This one, however, could be a bit tricky to solve because it seems to be deeply integrated in MIQ like this:

r = MiqReport.new
r.where_clause = [') and (timestamp >= ? and timestamp <= ?) and (event_type in (?))', *params]
r.to_timeline

Not sure, but I think MiqReport uses those hard-coded SQL WHERE clauses in order to be serializable (as opposed to stacking .where() statements). Furthermore, MiqReport class seems to be reused by many parts of the system, like "memory usage" graph, "top cpu consumers" grapht etc. See this example of MiqReport serialization.

In other words: I don't think we should go that deep.

EDIT: there seems to exist another approach to event filtering by leveraging report.conditions like for example here. Not sure, but perhaps we could just use this approach instead report.where_clause. Main difference is that this approach allows for ruby expressions and is executed after the SQL was performed. Performance could be an issue, though.

@tadeboro
Copy link
Contributor

tadeboro commented Aug 6, 2018

I did some digging in timeline-related parts of code and found this out.

First, events are filtered by ems_id when displaying them in provider timeline. The line responsible for this filtering is here. In order to confirm that filter is indeed applied, I manually created two dummy events: one with ems_id properly set and one without it. As expected, only the first event was shown on the timeline.

As for the regular expression support, I extended the core and ui that now allow classifying the events by matching them against a set of regular expressions. Details are described in the commit messages and pull request descriptions. For example, configuration snippet in comment from @gberginc above can be written as:

:ems:
  :ems_nuage_networking:
    :event_handling:
      :event_groups:
        :addition:
          :critical:
          - enterprise_create
          :critical_regex:
          - .*update
          - vrs.*

Regular expressions are listed under separate level in order to avoid auto-detection of event type templates.

@Fryguy Does anything from the pull requests make sense?

@miha-plesko
Copy link
Contributor Author

Awesome, we must have overlooked the line that filters by ems_id back then when doing quick research 👍 Knowing this, as long as we can bring regex support upstream I think we're good for Nuage.

@cben
Copy link
Contributor

cben commented Nov 15, 2018

Also implemented in ManageIQ/manageiq-providers-redfish#34.

@cben
Copy link
Contributor

cben commented Jan 21, 2019

Has any of this been backported to Gaprindashvili? AFAICT this is only in Hammer.
(This will help decide how to backport other fixes for #17201.)

@miha-plesko
Copy link
Contributor Author

It's Hammer only so far. Right @tadeboro ?

@agrare
Copy link
Member

agrare commented Jan 21, 2019

@cben @miha-plesko that's correct it is only in hammer-1 and will not be backported to gaprindashvilli

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants