-
Notifications
You must be signed in to change notification settings - Fork 67
litemetric guide
Litemetric, a part of Litestack, is a low overhead, simple and generic telemetry tool that collects runtime data about Ruby and Rails applications.
Litemetric, as other components in Litestack is built on top of SQLite. It uses the embedded database engine to store and query telemetry data. As a result, Litemetric is a very low maintenance system. There is no need to setup/maintain/monitor any service aside from the application that integrates Litemetric.
Litemetric follows a simplistic approach where it tries to Nprovide easy enough APIs to cover most of the needed cases for event acquisition and measurement. It does not attempt to be an elaborate performance monitoring system. Still, it can be sufficient for many application needs, zero administrative overhead is a plus!.
Litestack components (e.g. Litejob, Litecache, Litecable) can optionally use Litemetric to report on usage and performance.
- Capture single/multishot events
- Measure single/multishot events
- Snapshot information capturing
- In memory aggregation
- Background aggregator
- Background garabage collector
- Thread safety
- Async/Fiber Scheduler integration
- Graceful shutdown
- Fork resilience
- Polyphony integration
- Web reporting interface
For any class for which you need to collect metrics just include the Litemetric::Measurable module. Then we can set a unique identifier for the class by overriding the #metrics_identifier method.
Capturing and measuring events can then happen whenever required in the object methods.
# note that we only need to require litestack
# you could still do require 'litestack/litemetric'
class ImportantClass
include Litemetric::Measurable
# override the default identifier
def metrics_identifier
self.class.name
end
# the captured action will only be counted
# the database will have a count of times the event was captured
def simple
# do something
capture("simple")
end
# the measured action will also capture the runtime of the action
# the database will have a count of times the event was measured and the total time measured
def complex
measure("complex") do
# do something
end
end
Events can optionally have keys, to be able to differentiate them, here is an example that uses both event and key names when reporting a metrics:
class Ticker
def change(symbol, value)
capture("change", symbol, value)
end
end
This will results in each symbol being unique in the metrics database, such that it can be reported on alone or in aggregation with other symbols under the same "change" action.
Sometimes an action needs to be report on multiple keys at the same time. Like for example when you need to report job insertion rate for each named queue and for all the queues at once. Litemetric provides a simple way to achieve this
# capture multiple events in one shot
def enqueue(queue_name, job)
# do the action
capture("enqueue", ["all", queue_name])
end
# also with measurement
def perform(queue_name, job)
measure(perform, ["all", queue_name]) do
# do the action
end
end
The above results in two entries being captured/measured, one for the specific queue and one that aggregates over all queues.
Litemetric looks for a litemetric.yml file in its working directory, the syntax and defaults for the file are as follows:
path: path/to/your/db/file
flush_interval: 10 # how long are events buffered before flushing to db
summarize_intervale: 30 # delay between data summarizer runs
snapshot_intervale: 10*60 # how often to take snapshots from client libraries
The db path should preferably be outside of your application folder, in order to prevent accidental overrides during deployment
In their respective configuration files, you need to add this directive:
metrics: true # default is false
Events, keys and values captured for the different Litestack components are interpreted differently, here is a quick list:
event | key | value |
---|---|---|
Read | SQL text of read queries | time taken to run the query |
Write | SQL text of write queries | time taken to run the query |
Schema change | SQL text of the DDL query | time taken to run the query |
Pragma | Pragma statements ran by SQLite | time taken to run the statements |
event | key | value |
---|---|---|
enqueue | name of the queue that received the job | none |
dequeue | name of the queue that delivered the job | none |
perform | name of the queue that had the job | time taken to run the job |
event | key | value |
---|---|---|
get | the key of the cache object | the hit rate |
set | the key of the cache object | none |
Litestack comes with a simple web interface to report on the data collected by Litemetric. You can run the reporting tool by running the command liteboard in the console. You will need to provide Liteboard with the location of your metrics database file, for exact syntax use:
liteboard -h
Once started up properly, liteboard will show a simple break down of the events that were collected by Litemetric, it consists of 3 pages:
Shows the list of topics for which events were captured in the time range selected. For each topic you see the count of captured events and a historical trend of event counts over time
Show the data collected for a specific topic, showing different event names and their counts, and in case they had values, it will show average, total, min and max values. It will also show trendlines for counts and average values over time.
Optionally, a topic can publish snapshots of its state to Litemetric, and it will be displayed (if it exists) in that page
This pages show data for a specific event type, listing keys and their counts, value (avg, total, min & max) and the same trend lines like in the topics page.
Litemetric strives to be simple and lightweight, hence it doesn't try to keep data at the highest resolution, rather it is very aggressive in trying to aggregate and summarize data, it only keeps data a finer resolution if it is very fresh. These are the general breakdowns of data granularity:
data resolution | data kept for |
---|---|
every 5 minutes | 60 minutes |
every 1 hour | 24 hours |
every 24 hours | 7 days |
every week | 52 weeks |
This means that if you are looking at day for the last 7 days, it will only be available on day resolution. Beyond 52 weeks (1 year), the data is still stored at the same resolution (a data point for every week per event key) but it is not currently viewable in the dashboard (should be fixed in a later release)