Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Event handling functions (for central reporting). #2442

Open
hjoliver opened this issue Oct 9, 2017 · 14 comments
Open

Event handling functions (for central reporting). #2442

hjoliver opened this issue Oct 9, 2017 · 14 comments
Assignees
Milestone

Comments

@hjoliver
Copy link
Member

hjoliver commented Oct 9, 2017

Several sites (including mine) have a need for a central DB of routine events across all suites, to enable full-system analysis and reports without having to know where all the operational suites are.

IMO log scraping (or similarly, suite DB scraping) is not a good idea because the scraper program would need to know about all suites; our suite log content is not well standardised; continual DB reads might affect suite performance; suite logs get rolled over and logs and DBs get obliterated on cold start (so events could be missed if there's a scraper outage); and the scraper program itself would need monitoring etc.

We could use the existing event handlers for this, but they may be too heavy for reporting every task event [because each call executes a script in a subshell].

So, I propose we allow suite daemons to push routine event data to user-defined functions that know what to do with it (e.g. publish an event message to Kafka, or write to a central DB, or write to syslog). [by "user-defined" I mean the core functionality is that Cylc passes a defined data structure to a defined function interface - but what the function does with it is up to the user (or site) defined application - although we could supply some built-in examples, e.g. to write to syslog]

This would be easy to implement, and I think it avoids all of the above problems with log scraping.

(This is motivated by the same project as the new external event triggers, and I think this will be widely useful as well).

@cylc/core - do you agree?

@hjoliver hjoliver added this to the soon milestone Oct 9, 2017
@hjoliver hjoliver self-assigned this Oct 9, 2017
@hjoliver
Copy link
Member Author

hjoliver commented Oct 9, 2017

I suppose this could be thought of as event handler functions (rather than scripts)... but the information provided could possibly go beyond events as such.

@hjoliver hjoliver changed the title Best way to implement central reporting for multiple suites? Centralized event "logging" for multiple suites? Oct 9, 2017
@hjoliver
Copy link
Member Author

hjoliver commented Oct 9, 2017

Assuming we agree, thoughts on implementation:

Synchronous calls (i.e. in the main process) to a "logging function" (need a better term?) would be trivial to implement, if we can assume each call takes negligible time. But we should probably:

  • queue messages for sending in background thread or process
  • batch messages (functions could take a list of all currently queued messages).
  • limit queue size and enforce a timeout on each call, in case the external target system is down

@matthewrmshin
Copy link
Contributor

Can we simply add a handler to our logger using one of Python standard library logging.handlers with some filters?

See also #386.

@hjoliver
Copy link
Member Author

hjoliver commented Oct 10, 2017

I think std lib logging is only for logging to local files, no? For the purposes of this issue I'm using the term "logging" in the loosest possible sense, as in "the kind of information that typically gets logged" - but the point is, the central "log" is likely to be a DB, and it is likely to be on the other end of a message broker (e.g. Kafka) that aggregates information from multiple suites and other sources such as a PBS log scraper. That being the case, it seems to me we need "plugin" functions that receive the data and send it wherever (Kafka in BoM case), OR we need to log (or suite-db) scrape all suites - which has all the problems I mentioned above.

@matthewrmshin
Copy link
Contributor

No, the logging library is very extensible. Even with just the standard set of handlers, you can send logs to system log or to a socket. The logging.Handler class can also be extended to do anything, so I think it is best to exploit that instead of creating a custom protocol.

@hjoliver
Copy link
Member Author

OK, that's interesting. I'll look into this later ...

@hjoliver
Copy link
Member Author

hjoliver commented Oct 10, 2017

@matthewrmshin - on reflection, I'm not convinced on the logging library suggestion. I'm really proposing something simpler and more general. And it doesn't involve creating a custom protocol.

I envisage simply sending (periodically, as "loggable" events occur) a data structure of event data to a user-designated function that can do what it likes with the data. As far as Cylc is concerned that's it, job done, except for one thing: this function could be called a large number of times and we can't be sure that it will return super quickly, hence my musings about queuing calls to a background process (or pool).

It may be appropriate to use the Python logging library inside one of these functions, but that is up to the user or site. Although I suppose we could supply a built-in function for logging to syslog.

If the intent is to send data to a central reporting DB via a message broker, the "message" formulated inside the function, from the event data, will likely not even be a string (e.g. a list of DB column data).

@matthewrmshin
Copy link
Contributor

@hjoliver OK, I guess I got confused by the phrase logging here. I can now see that you are really talking about pushing data, on events, to a set of targets (or listeners? or observers?)

However, I am probably still missing the point here on event handlers. Can we not just have another built-in event handler that can do this sort of stuff? The email notification built-in event handler is effectively something like this - with multiple events being grouped together in a single message - the receiving end happens to be an SMTP server and the message happens to be a formatted email, but these can be anything really. (We'll probably need to refactor the event handler logic somewhat so all the different types of event handlers can have their own extension points in a plugin architecture.)

@hjoliver
Copy link
Member Author

hjoliver commented Oct 11, 2017

@matthewrmshin - fair enough, I can see how the word "logging" might have led you to believe I was actually talking about logging 😀 [UPDATE: better title added to issue]

Maybe I'm wrong, but I was concerned that event handlers - being executables launched in a sub-shell - are too heavy-weight to use for every event (this is for routine events, not exceptional events).

Hence why I've suggested using functions rather than scripts. As per my comment above #2442 (comment) my proposal essentially amounts to event handler functions (presumably lighter weight than scripts: just the Python pool process with no additional execution of a standalone script in a sub-shell).

In fact I've already added the capability to execute functions in the process pool, in #2423.

We could additionally allow aggregation (like the emails as you say) over some interval.

So I think we are actually in agreement now, if you agree to handler functions instead of (or as well as) scripts.

... plugin architecture

We should talk more about this via email. Here, and in #2423, I'm using the term "plugin" very loosely: you can make and activate a new plugin by simply writing a new function of the right form and putting it in the right place.

@matthewrmshin
Copy link
Contributor

OK.

I can see that we can probably do the same for something like the GUI. A suite currently generates the suite state summary regardless of whether we have a GUI connected to the suite or not. It would be nice if only do so when we have a connecting GUI. A GUI will start up a listener and ask the suite to push data to it on event. A suite at quiet time will no longer get polled by connecting GUIs all the time.

About plugin. I think we are in the same wavelength here. I am really talking about a common interface for a set of functional modules. I am not suggesting a system for plugin installation.

@hjoliver
Copy link
Member Author

we can probably do the same for something like the GUI. ..

That is is a good idea! I had not thought of that.

I am not suggesting a system for plugin installation.

I was just wondering if you were thinking of some kind of "registration" system, where the user determines what plugins are activated. However, I guess you weren't, and on reflection, in this context that would be pointless because "activated" just means available for use, not necessarily being used.

@hjoliver hjoliver changed the title Centralized event "logging" for multiple suites? Centralized event reporting for multiple suites? Oct 12, 2017
@hjoliver hjoliver changed the title Centralized event reporting for multiple suites? Central routine event reporting for many suites? Oct 12, 2017
@hjoliver
Copy link
Member Author

[Description and title above updated for - hopefully - better clarity]

@hjoliver hjoliver changed the title Central routine event reporting for many suites? Event handling functions (for central reporting). Oct 23, 2017
@hjoliver
Copy link
Member Author

hjoliver commented Oct 23, 2017

@matthewrmshin - this proposal basically amounts to supporting function (in addition to script) event handlers, with the functions called asynchronously in the process pool in case they're a bit slow. I am assuming this would provide a significant performance advantage under heavy use (e.g. for reporting all events routinely) even when executing these functions in the process pool - would you agree that's a valid assumption?

@matthewrmshin
Copy link
Contributor

The assumption is mostly likely correct.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants