Skip to content

EventCountLogger

Václav Bartoš edited this page Feb 4, 2021 · 2 revisions

Disclaimer: This is an old proposal, the actual implementation has been done as a standalone package and is a little different, see: https://github.com/CESNET/EventCountLogger


EventCountLogger is a module for counting number of events within defined time intervals in a distributed environment.

In consists of several parts:

  • Redis database - counters of events are stored in Redis database
  • EventCountLogger - main logging code, every process which needs to log events runs one instance
  • EventCountLoggerMaster - store and reset counters in reagular intervals, simple script running in single instance only
  • a configuration file

How it works

Application defines a set of event groups, each group contains one or more event IDs (counters) and one or more time intervals, in which the counters are reset. Any process can call a method (log_event) to log that an event of given group and ID has just happened, which results in incrementing the corresponding counter. All counters are periodically reset in intervals defined by the configuration. Each group may define more than one interval - in that case the implementation contains more counters per event ID, all incremented simultaneously, but reset in different intervals. For each counter, there is a current value, which is incrementing with each call of log_event, and a "last" value, which stores the final value of the last completed interval (that is, at the end of an interval, values from current are moved to last and current is reset). Anyone who wants to read statistics can read the value in last until it's overwritten by the new value at the end of the next interval.

DB scheme

Each group, interval and event_id has a pair of entries in Redis - cur for value in current (incomplete) interval and last for value of the last complete interval. The keys are named as <group_name>:<interval>:<cur/last>:<event_id>.

Intervals are specified as an integer number followed by a one-char suffix: s (seconds), m (minutes), h (hours) or d (days)

A key specifying UNIX timestamp of the begining of the interval (in UTC) is stored in <group_name>:<interval>:<cur/last>:@ts.

For example, consider a group myGroup with two event IDs, eventX and eventY, and two intervals, 5 minutes (5m) and 1 hour (1h). There will be the following keys stored in Redis database:

myGroup:5h:cur:@ts
myGroup:5m:cur:eventX
myGroup:5m:cur:eventY
myGroup:5m:last:@ts
myGroup:5m:last:eventX
myGroup:5m:last:eventY
myGroup:1h:cur:@ts
myGroup:1h:cur:eventX
myGroup:1h:cur:eventY
myGroup:1h:last:@ts
myGroup:1h:last:eventX
myGroup:1h:last:eventY

When log_event("myGroup", "eventX") is called, both myGroup:5m:cur:eventX and myGroup:1h:cur:eventX are incremented by 1. At 5 minute intervals, all myGroup:5m:cur:* are moved to myGroup:5m:last:* (similarly for 1 hour intervals and 1h).

Multiprocessing

The system is distributed, i.e. logging may happen in multiple processes in parallel, reseting of counters is also done by a separate process. There are two modes of synchronization.

  • Direct - all counter increments are done directly in Redis. This ensures Redis always contains the most recent state, but incurs more overhead when there's a lot of events.
  • Local cache - each instance of EventCountLogger has its own copy of counters which are being incremented locally and only once a while synchronized with Redis. The synchronization is done either in regular time intervals, or when a counter reaches a defined value. During synchoronization, counters in Redis are incremented by values of local counters and local counters are set to total values from Redis. This is more efficient but data in Redis may be missing the newest events.

Configuration

Configuration is specified in a YAML-formatted file (/etc/nerd/eventcountlogger.yml by default). It contains specification of groups and their parameters:

redis:
    hostname: localhost  # optional, default: localhost
    port: 6379  # optional, default: 6379
    db: 1  # Redis DB used for statistics; optional, defualt: 1

groups:
    group_name:
        eventids: ["event1", "event2", ...]
        intervals: ["10s", "5m", "1h", "24h"]
        #actions: [ ... ]   # In the future, some actions can be done at the end of each interval, like write data to file or run external script
        sync_interval: 2  # synchronize local counters with redis after this number of seconds (may be float) (default: none)
        sync_limit: 100 # synchronize local counters with redis when any counter in the group reaches this value (default: none)
    another_group_name:
        ...

redis part set up connection to Redis database to be used.

Group is defined by specifying an item under groups with key set to the name of group. Each group must define a list of counter-reset intervals containing at least one interval.

List of events in a group may be given statically in configuration (eventids), or may be defined dynamically at run time. In that case, eventids is set to empty list and the event IDs must be defined from EventCountLogger instances using declare_eventid method.

sync_interval and sync_limit specifies when the local copy of counters should be synchronized with Redis, in number of seconds since last synchronization or when value of any counter in the group reaches the limit (whichever happens first, when both are configured). If neither of those parameters are set, local counters are disbaled and all counter operations are done directly in Redis.

EventCountLogger

A Python module loaded in every process which needs to log event counts. It provides the following classes, functions and methods:

  • class EventGroup()

    • Singleton class representing an event group and its configuration
  • get_group(name)

    • Create EventGroup representing given event group, loading its parameters from configuration file, or return refernce to the existing one.
  • EventGroup.log_event(event_id, n=1)

    • Increment counter event_id in given group by n.
  • EventGroup.get_count(event_id)

    • Return current sate of counter event_id (local one is used when local counters are enabled).
  • EventGroup.sync()

    • Force synchronization of counters in this group (do nothing when local counters are not enabled).
  • EventGroup.declare_event_id(event_id)

    • Create counter for event_id if it doesn't exist yet. Should be equivalent to listing the event ID in configuration file.
  • EventGroup.declare_event_ids([event_id, ...])

    • Same as declare_event_id, just for multiple event IDs at once.
  • For convenience, all EventGroup methods should also have a module-level function with the same name, taking group name as the first parameter. For example log_event(group, event_id, n=1).

EventCountLoggerMaster

A standalone script which must always run together with components using EventCountLogger. In regular intervals, given by configuration for each group, it moves current numbers of "cur" counters to "last" counters. If event IDs are given in configuration, it sets cur counters of all those events ID to 0. It also sets key <group_name>:<interval>:cur:@ts to UNIX timestamp (in UTC) of the new interval. All these operations over a single group must be done atomically (i.e. using Redis transaction).

In the future, it should also provide a possibility to define some actions at the end of each interval, but this is not defined yet.

Reading data

Any script/process can read data of the last finished interval of any group by reading Redis keys named as <group_name>:<interval>:last:<event_id>. A reader can also check the <group_name>:<interval>:last:@ts to see if the data are being updated correctly.

In case some event_id is missing in Redis the reader should assume it's value is zero.

Clone this wiki locally