From 32eff3efb0abbb4cade9c576751c40de18a17262 Mon Sep 17 00:00:00 2001 From: John Floren Date: Fri, 12 Jul 2024 12:21:36 -0700 Subject: [PATCH] Fix up Kafka ingester docs They had fallen out of date with the actual code. Addresses gravwell/wiki#1047 --- ingesters/http.md | 16 ------- ingesters/kafka.md | 104 ++++++++++++++++++++++++--------------------- 2 files changed, 56 insertions(+), 64 deletions(-) diff --git a/ingesters/http.md b/ingesters/http.md index 081f1641..c1b44dff 100644 --- a/ingesters/http.md +++ b/ingesters/http.md @@ -70,22 +70,6 @@ Multiple "Listener" definitions can be defined allowing specific URLs to send en TokenValue=Secret ``` -## Installation - -If you're using the Gravwell Debian repository, installation is just a single apt command: - -``` -apt-get install gravwell-http-ingester -``` - -Otherwise, download the installer from the [Downloads page](/quickstart/downloads). Using a terminal on the Gravwell server, issue the following command as a superuser (e.g. via the `sudo` command) to install the ingester: - -```console -root@gravserver ~ # bash gravwell_http_ingester_installer_3.0.0.sh -``` - -If the Gravwell services are present on the same machine, the installation script will automatically extract and configure the `Ingest-Auth` parameter and set it appropriately. However, if your ingester is not resident on the same machine as a pre-existing Gravwell backend, the installer will prompt for the authentication token and the IP address of the Gravwell indexer. You can set these values during installation or leave them blank and modify the configuration file in `/opt/gravwell/etc/gravwell_http_ingester.conf` manually. - ## Configuring HTTPS By default the HTTP Ingester runs a cleartext HTTP server, but it can be configured to run an HTTPS server using x509 TLS certificates. To configure the HTTP Ingester as an HTTPS server provide a certificate and key PEM files in the Global configuration space using the `TLS-Certificate-File` and `TLS-Key-File` parameters. diff --git a/ingesters/kafka.md b/ingesters/kafka.md index 868bc3d2..f243bcb1 100644 --- a/ingesters/kafka.md +++ b/ingesters/kafka.md @@ -7,24 +7,67 @@ myst: --- # Kafka -The Kafka ingester designed to act as a consumer for [Apache Kafka](https://kafka.apache.org/) so that data Gravwell can attach to a Kafka cluster and consume data. Kafka can act as a high availability [data broker](https://kafka.apache.org/uses#uses_logs) to Gravwell. Kafka can take on some of the roles provided by the Gravwell Federator, or ease the burden of integrating Gravwell into an existing data flow. If your data is already flowing to Kafka, integrating Gravwell is just an `apt-get` away. +The Kafka ingester is designed to act as a consumer for [Apache Kafka](https://kafka.apache.org/) so that Gravwell can attach to a Kafka cluster and consume data. Kafka can act as a high-availability [data broker](https://kafka.apache.org/uses#uses_logs) to Gravwell. Kafka can take on some of the roles provided by the Gravwell Federator, or ease the burden of integrating Gravwell into an existing data flow. If your data is already flowing to Kafka, integrating Gravwell is just an `apt-get` away. -The Gravwell Kafka ingester is best suited as a co-located ingest point for a single indexer. If you are operating a Kafka cluster and a Gravwell cluster, it is best not to duplicate the load balancing characteristics of Kafka at the Gravwell ingest layer. Install the Kafka ingester on the same machine as the Gravwell indexer and use the Unix named pipe connection. Each indexer should be configured with its own Kafka ingester, this way the Kafka cluster can manage load balancing. +The Gravwell Kafka ingester is best suited as a co-located ingest point for a single indexer. If you are operating a Kafka cluster and a Gravwell cluster, it is best not to duplicate the load balancing characteristics of Kafka at the Gravwell ingest layer. Each indexer should be configured with its own Kafka ingester, allowing the Kafka cluster to manage load balancing; install the Kafka ingester on the same machine as the Gravwell indexer and use the Unix named pipe connection for communication with the indexer. -Most Kafka configurations enforce a data durability guarantee, which means data is stored in non-volatile storage when consumers are not available to consume it. As a result we do not recommend that the Gravwell ingest cache be enabled on Kafka ingester, instead let Kafka provide the data durability. +Most Kafka configurations enforce a data durability guarantee, which means data is stored in non-volatile storage when consumers are not available to consume it. As a result we do not recommend that the Gravwell ingest cache be enabled on Kafka ingester; instead, let Kafka provide the data durability. ## Installation ```{include} installation_instructions_template ``` -## Basic Configuration +## Configuration The Kafka ingester uses the unified global configuration block described in the [ingester section](ingesters_global_configuration_parameters). Like most other Gravwell ingesters, the Kafka Ingester supports multiple upstream indexers, TLS, cleartext, and named pipe connections, a local cache, and local logging. The configuration file is at `/opt/gravwell/etc/kafka.conf`. The ingester will also read configuration snippets from its [configuration overlay directory](configuration_overlays) (`/opt/gravwell/etc/kafka.conf.d`). -## Consumer Examples +### Consumer Configurations + +The Gravwell Kafka ingester can subscribe to multiple topics and even multiple Kafka clusters. Each consumer defines a consumer block with a few key configuration values. + +The following parameters configure the connection to the Kafka cluster: + +| Parameter | Type | Descriptions | Required | +|-----------|------|--------------| -------- | +| Leader | host:port | The Kafka cluster leader/broker. This should be an IP or hostname, if no port is specified the default port of 9092 is appended | YES | +| Topic | string | The Kafka topic this consumer will read from | YES | +| Consumer-Group | string | The Kafka consumer group this ingester is a member of; default is `gravwell`. | +| Rebalance-Strategy | string | The re-balancing strategy to use when reading from Kafka. Options are `roundrobin` (default), `sticky`, and `range`. | +| Auth-Type | string | Enable SASL authentiation and specify mechanism. | +| Username | string | Specify username for SASL authentication. | +| Password | string | Specify password for SASL authentication. | +| Use-TLS | boolean | If set, the ingester will connect to the Kafka cluster using TLS. | +| Insecure-Skip-TLS-Verify | boolean | If TLS is in use, setting this parameter will make the ingester ignore invalid TLS certificates. | + +These parameters configure how the ingester handles incoming data from Kafka: + +| Parameter | Type | Descriptions | Required | +|-----------|------|--------------| -------- | +| Default-Tag | string | Entries which do not receive a tag from the `Tag-Header` will be assigned this default tag. | YES | +| Tag-Header | string | If set, the ingester will look at the specified header to determine into which tag the entry should be ingested. If the header is not set on the message, the `Default-Tag` will be used. By default, `Tag-Header` is set to "TAG". | +| Tags | string | Specifies a list of allowable tags (or wildcard patterns) for the `Tag-Header`, e.g. `Tags=gravwell,foo,b*r`. Any entry with a tag which does not match one of the patterns will instead be assigned the `Default-Tag`. | +| Source-Header | string | Gravwell producers will often put the data source address in a message header, if set the ingester will attempt to interpret the given header as a Source address. If the header is not correct the ingester will apply the source override (if set) or the default source. | +| Source-As-Binary | boolean | If set, the ingester will assume that the contents of the `Source-Header` are in binary format, rather than a string. | +| Synchronous | boolean | If set, the ingester will perform a sync on the ingest connection every time a Kafka batch is written. | +| Batch-Size | integer | The number of entries to read from Kafka before forcing a write to the ingest connection; the default is 512. | + +These parameters give some standard Gravwell ingester configuration options related to timestamps, timezones, and the source field. See the [general ingester configuration page](/ingesters/ingesters) for more information about these parameters. + +| Parameter | Type | Descriptions | Required | +|-----------|------|--------------| -------- | +| Source-Override | IPv4 or IPv6 | An IP address to use as the SRC for all entries. | +| Ignore-Timestamps | boolean | If set, the ingester will apply the current timestamp to all received entries, ignoring Kafka timestamps. | +| Extract-Timestamps | boolean | If set, the ingester will ignore the Kafka timestamps and attempt to extract a timestamp from the entry's contents. | +| Assume-Local-Timezone | boolean | If set, when extracting timestamps from entries the timezone will be assumed to be local, if not explicitly set. | +| Timezone-Override | string | If set, timestamps will be parsed in the given timezone, e.g. "America/New_York". | +| Timestamp_Format_Override | string | Specifies a timestamp format, e.g. "RFC822", to use when parsing timestamps. | + +As with most ingesters, each consumer may also specify [preprocessors](/ingesters/preprocessors/preprocessors) if needed. + +### Consumer Examples ``` [Consumer "default"] @@ -35,55 +78,20 @@ The configuration file is at `/opt/gravwell/etc/kafka.conf`. The ingester will a Tag-Header=TAG #look for the tag in the Kafka TAG header Source-Header=SRC #look for the source in the Kafka SRC header +# This consumer does not specify a Tags parameter, so all entries will get the Default-Tag [Consumer "test"] Leader="127.0.0.1:9092" - Tag-Name=test + Default-Tag=test Topic=test Consumer-Group=mygroup Synchronous=true - Key-As-Source=true #A custom feeder is putting its source IP in the message key value - Header-As-Source="TS" #look for a header key named TS and treat that as a source - Source-As-Text=true #the source value is going to come in as a text representation + Source-Header=SRC #A custom feeder is putting its source IP in the header named "SRC" Batch-Size=256 #get up to 256 messages before consuming and pushing - Rebalance-Strategy=roundrobin + Rebalance-Strategy=sticky ``` -## Installation - -The Kafka ingester is available in the Gravwell Debian repository as a Debian package as well as a shell installer on our [Downloads page](/quickstart/downloads). Installation via the repository is performed using `apt`: - -``` -apt-get install gravwell-kafka -``` - -The shell installer provides support for any non-Debian system that uses systemd, including Arch, Redhat, Gentoo, and Fedora. - -```console -root@gravserver ~ # bash gravwell_kafka_installer.sh -``` - -## Configuration - -The Gravwell Kafka ingester can subscribe to multiple topics and even multiple Kafka clusters. Each consumer defines a consumer block with a few key configuration values. - - -| Parameter | Type | Descriptions | Required | -|-----------|------|--------------| -------- | -| Tag-Name | string | The Gravwell tag that data should be sent to. | YES | -| Leader | host:port | The Kafka cluster leader/broker. This should be an IP or hostname, if no port is specified the default port of 9092 is appended | YES | -| Topic | string | The Kafka topic this consumer will read from | YES | -| Consumer-Group | string | The Kafka consumer group this ingester is a member of | NO - default is `gravwell` | -| Source-Override | IPv4 or IPv6 | An IP address to use as the SRC for all entries | NO | -| Rebalance-Strategy | string | The re-balancing strategy to use when reading from Kafka | NO - default is `roundrobin`. `sticky`, and `range` are also options | -| Key-As-Source | boolean | Gravwell producers will often put the data source address in a message key, if set the ingester will attempt to interpret the message key as a Source address. If the key structure is not correct the ingester will apply the override (if set) or the default source. | NO - default is false | -| Synchronous | boolean | The ingester will perform a sync on the ingest connection every time a Kafka batch is written. | NO - default is false | -| Batch-Size | integer | The number of entries to read from Kafka before forcing a write to the ingest connection | NO - default is 512 | -| Auth-Type | string | Enable SASL authentiation and specify mechanism | -| Username | string | Specify username for SASL authentication | -| Password | string | Specify password for SASL authentication | - ```{warning} -Setting any consumer as synchronous causes that consumer to continually Sync the ingest pipeline. It will have significant performance implications for ALL consumers. +Setting any consumer as synchronous causes that consumer to continually sync the ingest pipeline. It will have significant performance implications for ALL consumers. ``` ```{note} @@ -125,16 +133,16 @@ Log-File=/opt/gravwell/log/kafka.log [Consumer "default"] Leader="tasks.kafka.internal" - Tag-Name=default + Default-Tag=default + Tags=* Topic=default Consumer-Group=gravwell1 - Key-As-Source=true Batch-Size=256 [Consumer "test"] Leader="tasks.testcluster.internal:9092" - Tag-Name=test + Default-Tag=test Topic=test Consumer-Group=testgroup Source-Override="192.168.1.1"