Blueflood is a multi-tenant distributed metric processing system created by engineers at Rackspace. It is used in production by the Rackspace Monitoring team to process metrics generated by their monitoring systems. Blueflood is capable of ingesting, rolling up and serving metrics at a massive scale.
Simply put, Blueflood is a big, fast database for your metrics. Data from Blueflood can be used to construct dashboards, generate reports, graphs or for any other use involving time-series data. It focuses on near-realtime data, with data that is queryable mere milliseconds after ingestion. Data is stored using Cassandra to make Blueflood fault-tolerant and highly-available.
In contrast to forebearers such as CarbonDB or RRDTool, your Blueflood cluster can expand as your metrics needs grow. Simply add more Cassandra nodes.
You need a Cassandra and Elasticsearch instance for Blueflood to connect to. I recommend following the 10-Minute Guide on the Blueflood wiki for a quick startup. After that, there are two main ways you can run this image that I'll call quick mode and full mode.
Quick mode auto-inits Cassandra and Elasticsearch at startup and lets you configure Blueflood via environment variables.
Full mode lets you specify a custom blueflood config file and log4j config file to use instead of relying on keeping everything in the environment. Auto-initializing Cassandra and Elasticsearch can be disabled via environment variables. Blueflood properties set in the environment still override what's in the config file.
In either case, here are the primary environment variables to consider when starting a Blueflood container:
Variable | Description | default |
---|---|---|
DEBUG_JAVA | Whether to enable remote JVM debugging on port 5005 | false |
DEBUG_JAVA_SUSPEND | Whether to suspend JVM startup until a debugger attaches to the debug port | false |
INIT_CASSANDRA | Whether to run the init script to create the Cassandra schema at startup | true |
INIT_ELASTICSEARCH | Whether to run the init script to create the Elasticsearch indexes at startup | true |
BLUEFLOOD_CONF_LOCATION | Path to a Blueflood configuration file inside the container; if not set, all env variables are used as Blueflood properties | - |
BLUEFLOOD_LOG4J_CONF_LOCATION | Path to a log4j configuration file inside the container; if not set, a default config file is created to log to stdout | - |
BLUEFLOOD_LOG_LEVEL | Only if not using BLUEFLOOD_LOG4J_CONF_LOCATION, set the log level for all Blueflood classes | INFO |
To use "full mode", set an environment like
INIT_CASSANDRA=false
INIT_ELASTICSEARCH=false
BLUEFLOOD_CONF_LOCATION=<path to mounted conf file>
Blueflood is configured by its configuration file and environment variables. A given config setting in the environment overrides the same setting in the config file. Here are some additional settings to help you get started.
Variable | Description | default |
---|---|---|
CASSANDRA_HOST | IP address of Cassandra seed. (Required) | null |
ELASTICSEARCH_HOST | IP address of Elasticsearch node. (Required) | null |
MAX_ROLLUP_READ_THREADS | Maximum number of read threads participating in rolling up the metrics | 20 |
MAX_ROLLUP_WRITE_THREADS | Maximum number of write threads participating in rolling up the metrics | 5 |
MAX_CASSANDRA_CONNECTIONS | Maximum number of connections with each Cassandra node | 70 |
INGEST_MODE | Whether to start the Ingest service | true |
ROLLUP_MODE | Whether to start the Rollup service | true |
QUERY_MODE | Whether to start the Query service | true |
MIN_HEAP_SIZE | Initial size of the heap to be allocated to BF process. | 1G |
MAX_HEAP_SIZE | Maximum size of the heap to be allocated to BF process. | 1G |
GRAPHITE_HOST | IP address of the Graphite host to monitor your container | " " |
GRAPHITE_PORT | Line port of the Graphite host to monitor your container | 2003 |
GRAPHITE_PREFIX | Prefix for graphite metrics. | Host name of the container. |
If you want to play with the these variables at PRO level, find all possible settings in the following config classes: