Event handling functions (for central reporting). #2442

hjoliver · 2017-10-09T22:09:11Z

Several sites (including mine) have a need for a central DB of routine events across all suites, to enable full-system analysis and reports without having to know where all the operational suites are.

IMO log scraping (or similarly, suite DB scraping) is not a good idea because the scraper program would need to know about all suites; our suite log content is not well standardised; continual DB reads might affect suite performance; suite logs get rolled over and logs and DBs get obliterated on cold start (so events could be missed if there's a scraper outage); and the scraper program itself would need monitoring etc.

We could use the existing event handlers for this, but they may be too heavy for reporting every task event [because each call executes a script in a subshell].

So, I propose we allow suite daemons to push routine event data to user-defined functions that know what to do with it (e.g. publish an event message to Kafka, or write to a central DB, or write to syslog). [by "user-defined" I mean the core functionality is that Cylc passes a defined data structure to a defined function interface - but what the function does with it is up to the user (or site) defined application - although we could supply some built-in examples, e.g. to write to syslog]

This would be easy to implement, and I think it avoids all of the above problems with log scraping.

(This is motivated by the same project as the new external event triggers, and I think this will be widely useful as well).

@cylc/core - do you agree?

hjoliver · 2017-10-09T22:14:44Z

I suppose this could be thought of as event handler functions (rather than scripts)... but the information provided could possibly go beyond events as such.

hjoliver · 2017-10-09T22:29:19Z

Assuming we agree, thoughts on implementation:

Synchronous calls (i.e. in the main process) to a "logging function" (need a better term?) would be trivial to implement, if we can assume each call takes negligible time. But we should probably:

queue messages for sending in background thread or process
batch messages (functions could take a list of all currently queued messages).
limit queue size and enforce a timeout on each call, in case the external target system is down

matthewrmshin · 2017-10-10T08:05:17Z

Can we simply add a handler to our logger using one of Python standard library logging.handlers with some filters?

See also #386.

hjoliver · 2017-10-10T09:48:29Z

I think std lib logging is only for logging to local files, no? For the purposes of this issue I'm using the term "logging" in the loosest possible sense, as in "the kind of information that typically gets logged" - but the point is, the central "log" is likely to be a DB, and it is likely to be on the other end of a message broker (e.g. Kafka) that aggregates information from multiple suites and other sources such as a PBS log scraper. That being the case, it seems to me we need "plugin" functions that receive the data and send it wherever (Kafka in BoM case), OR we need to log (or suite-db) scrape all suites - which has all the problems I mentioned above.

matthewrmshin · 2017-10-10T10:03:41Z

No, the logging library is very extensible. Even with just the standard set of handlers, you can send logs to system log or to a socket. The logging.Handler class can also be extended to do anything, so I think it is best to exploit that instead of creating a custom protocol.

hjoliver · 2017-10-10T10:24:28Z

OK, that's interesting. I'll look into this later ...

hjoliver · 2017-10-10T20:58:10Z

@matthewrmshin - on reflection, I'm not convinced on the logging library suggestion. I'm really proposing something simpler and more general. And it doesn't involve creating a custom protocol.

I envisage simply sending (periodically, as "loggable" events occur) a data structure of event data to a user-designated function that can do what it likes with the data. As far as Cylc is concerned that's it, job done, except for one thing: this function could be called a large number of times and we can't be sure that it will return super quickly, hence my musings about queuing calls to a background process (or pool).

It may be appropriate to use the Python logging library inside one of these functions, but that is up to the user or site. Although I suppose we could supply a built-in function for logging to syslog.

If the intent is to send data to a central reporting DB via a message broker, the "message" formulated inside the function, from the event data, will likely not even be a string (e.g. a list of DB column data).

matthewrmshin · 2017-10-11T08:30:06Z

@hjoliver OK, I guess I got confused by the phrase logging here. I can now see that you are really talking about pushing data, on events, to a set of targets (or listeners? or observers?)

However, I am probably still missing the point here on event handlers. Can we not just have another built-in event handler that can do this sort of stuff? The email notification built-in event handler is effectively something like this - with multiple events being grouped together in a single message - the receiving end happens to be an SMTP server and the message happens to be a formatted email, but these can be anything really. (We'll probably need to refactor the event handler logic somewhat so all the different types of event handlers can have their own extension points in a plugin architecture.)

hjoliver · 2017-10-11T09:32:29Z

@matthewrmshin - fair enough, I can see how the word "logging" might have led you to believe I was actually talking about logging 😀 [UPDATE: better title added to issue]

Maybe I'm wrong, but I was concerned that event handlers - being executables launched in a sub-shell - are too heavy-weight to use for every event (this is for routine events, not exceptional events).

Hence why I've suggested using functions rather than scripts. As per my comment above #2442 (comment) my proposal essentially amounts to event handler functions (presumably lighter weight than scripts: just the Python pool process with no additional execution of a standalone script in a sub-shell).

In fact I've already added the capability to execute functions in the process pool, in #2423.

We could additionally allow aggregation (like the emails as you say) over some interval.

So I think we are actually in agreement now, if you agree to handler functions instead of (or as well as) scripts.

... plugin architecture

We should talk more about this via email. Here, and in #2423, I'm using the term "plugin" very loosely: you can make and activate a new plugin by simply writing a new function of the right form and putting it in the right place.

matthewrmshin · 2017-10-11T11:14:07Z

OK.

I can see that we can probably do the same for something like the GUI. A suite currently generates the suite state summary regardless of whether we have a GUI connected to the suite or not. It would be nice if only do so when we have a connecting GUI. A GUI will start up a listener and ask the suite to push data to it on event. A suite at quiet time will no longer get polled by connecting GUIs all the time.

About plugin. I think we are in the same wavelength here. I am really talking about a common interface for a set of functional modules. I am not suggesting a system for plugin installation.

hjoliver · 2017-10-11T18:31:52Z

we can probably do the same for something like the GUI. ..

That is is a good idea! I had not thought of that.

I am not suggesting a system for plugin installation.

I was just wondering if you were thinking of some kind of "registration" system, where the user determines what plugins are activated. However, I guess you weren't, and on reflection, in this context that would be pointless because "activated" just means available for use, not necessarily being used.

hjoliver · 2017-10-12T20:06:12Z

[Description and title above updated for - hopefully - better clarity]

hjoliver · 2017-10-23T08:21:38Z

@matthewrmshin - this proposal basically amounts to supporting function (in addition to script) event handlers, with the functions called asynchronously in the process pool in case they're a bit slow. I am assuming this would provide a significant performance advantage under heavy use (e.g. for reporting all events routinely) even when executing these functions in the process pool - would you agree that's a valid assumption?

matthewrmshin · 2017-10-23T08:30:43Z

The assumption is mostly likely correct.

hjoliver added this to the soon milestone Oct 9, 2017

hjoliver self-assigned this Oct 9, 2017

hjoliver changed the title ~~Best way to implement central reporting for multiple suites?~~ Centralized event "logging" for multiple suites? Oct 9, 2017

hjoliver changed the title ~~Centralized event "logging" for multiple suites?~~ Centralized event reporting for multiple suites? Oct 12, 2017

hjoliver changed the title ~~Centralized event reporting for multiple suites?~~ Central routine event reporting for many suites? Oct 12, 2017

hjoliver changed the title ~~Central routine event reporting for many suites?~~ Event handling functions (for central reporting). Oct 23, 2017

hjoliver mentioned this issue Oct 23, 2017

Alow more task summary info as event handler args. #2458

Merged

hjoliver mentioned this issue Nov 7, 2017

Refactor TaskProxy.summary and its use for event handler args #2476

Open

matthewrmshin mentioned this issue Jan 26, 2018

Python API #1962

Open

matthewrmshin modified the milestones: soon, cylc-9 Aug 28, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Event handling functions (for central reporting). #2442

Event handling functions (for central reporting). #2442

hjoliver commented Oct 9, 2017 •

edited

Loading

hjoliver commented Oct 9, 2017

hjoliver commented Oct 9, 2017 •

edited

Loading

matthewrmshin commented Oct 10, 2017

hjoliver commented Oct 10, 2017 •

edited

Loading

matthewrmshin commented Oct 10, 2017

hjoliver commented Oct 10, 2017

hjoliver commented Oct 10, 2017 •

edited

Loading

matthewrmshin commented Oct 11, 2017

hjoliver commented Oct 11, 2017 •

edited

Loading

matthewrmshin commented Oct 11, 2017

hjoliver commented Oct 11, 2017

hjoliver commented Oct 12, 2017

hjoliver commented Oct 23, 2017 •

edited

Loading

matthewrmshin commented Oct 23, 2017

Event handling functions (for central reporting). #2442

Event handling functions (for central reporting). #2442

Comments

hjoliver commented Oct 9, 2017 • edited Loading

hjoliver commented Oct 9, 2017

hjoliver commented Oct 9, 2017 • edited Loading

matthewrmshin commented Oct 10, 2017

hjoliver commented Oct 10, 2017 • edited Loading

matthewrmshin commented Oct 10, 2017

hjoliver commented Oct 10, 2017

hjoliver commented Oct 10, 2017 • edited Loading

matthewrmshin commented Oct 11, 2017

hjoliver commented Oct 11, 2017 • edited Loading

matthewrmshin commented Oct 11, 2017

hjoliver commented Oct 11, 2017

hjoliver commented Oct 12, 2017

hjoliver commented Oct 23, 2017 • edited Loading

matthewrmshin commented Oct 23, 2017

hjoliver commented Oct 9, 2017 •

edited

Loading

hjoliver commented Oct 9, 2017 •

edited

Loading

hjoliver commented Oct 10, 2017 •

edited

Loading

hjoliver commented Oct 10, 2017 •

edited

Loading

hjoliver commented Oct 11, 2017 •

edited

Loading

hjoliver commented Oct 23, 2017 •

edited

Loading