Skip to content

Commit

Permalink
Merge pull request #1011 from john-floren-gravwell/phrasing
Browse files Browse the repository at this point in the history
Eliminate a lot of awkward phrasings.
  • Loading branch information
ashnwade authored Jun 7, 2024
2 parents 0bb4b7a + 77d65c2 commit 9e47995
Show file tree
Hide file tree
Showing 41 changed files with 60 additions and 62 deletions.
4 changes: 2 additions & 2 deletions architecture/architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@ The base installation shown above uses a single ingester that is distributing en

Ingesters come in all shapes and sizes, and the ingest API is open source. Gravwell has open sourced several of the base ingesters, and as the community grows, more and more ingesters are popping up to get data from a very wide variety of sources; and because Gravwell supports binary ingest and processing, there is no limit to what you can ingest. As long as your data can be encapsulated into an atomic item and a timestamp associated with it, Gravwell will consume and search it.

A very flexible ingest framework allows for extremely complex and secure topologies. Customers often wish to segment their data analysis platform from the rest of the infrastructure. The segmentation means that allowing all the workstations in a network to directly talk to the indexers is not desired. Here, Gravwell supports arbitrarily deep ingester federation, meaning that if you have multiple tiers of network classification you can dual home Gravwell ingesters to safely relay data from public networks to a private analysis network. As an example, the assumption is that a modest Gravwell topology is gathering data from a more complex enterprise. This enterprise has public facing webservers, private file servers, domain controllers, firewalls, workstations, and private switching gear; and the network engineers and I.T. security folks are responsible for it all.
A very flexible ingest framework enables extremely complex and secure topologies. Customers often wish to segment their data analysis platform from the rest of the infrastructure. The segmentation means that allowing all the workstations in a network to directly talk to the indexers is not desired. Here, Gravwell supports arbitrarily deep ingester federation, meaning that if you have multiple tiers of network classification you can dual home Gravwell ingesters to safely relay data from public networks to a private analysis network. As an example, the assumption is that a modest Gravwell topology is gathering data from a more complex enterprise. This enterprise has public facing webservers, private file servers, domain controllers, firewalls, workstations, and private switching gear; and the network engineers and I.T. security folks are responsible for it all.

Being a well thought out and secure enterprise the engineers and I.T. people have segmented resources and isolated areas of the business. Public facing webservers are on different network segments than the Windows workstations. Switching gear has private management LANs and each segment has a stateful firewall ensuring no one slips through. This topology will not allow all data sources to directly talk to the Gravwell cluster. Instead, we deploy ingester relays that can be dual-homed and heavily fortified to relay from untrusted networks such as the public webserver to more trusted networks like the Gravwell analytics cluster. The Windows domain machines are all pushing their logs into the domain controller, and the domain controller is pushing everything into Gravwell. Switches are pushing port activity logs and sflow records, firewalls are pushing alerts, and the fileserver is pushing file access logs.

Expand All @@ -87,7 +87,7 @@ Gravwell embraces the theme of concurrency throughout the entire stack, includin

![An example of how Gravwell storage may be organized](ExampleStorage.png)

The storage array concurrency and configurability of Gravwell allows for exceptionally high throughput. A high end NVME drive can sustain upwards of 2GB/s read rates and striping across a few of these with a single well means that Gravwell can store and read at many gigabytes per second. High end storage might just shift the bottleneck from storage speed to memory bandwidth. Some of the test instances on semi-modern hardware with quad-channel memory and two NVME based SSD storage arrays have seen search speeds in excess of 3GB/s per node. A moderately sized cluster with AMD Epic or Intel E5 v4 CPUs could easily see 10GB/s per node with well crafted queries.
The storage array concurrency and configurability of Gravwell allows exceptionally high throughput. A high end NVME drive can sustain upwards of 2GB/s read rates and striping across a few of these with a single well means that Gravwell can store and read at many gigabytes per second. High end storage might just shift the bottleneck from storage speed to memory bandwidth. Some of the test instances on semi-modern hardware with quad-channel memory and two NVME based SSD storage arrays have seen search speeds in excess of 3GB/s per node. A moderately sized cluster with AMD Epic or Intel E5 v4 CPUs could easily see 10GB/s per node with well crafted queries.

## Ingester Topology

Expand Down
10 changes: 5 additions & 5 deletions configuration/accelerators.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ Gravwell accelerators use a filtering technique that works best when data is rel

Tags are always included in the acceleration, regardless of the extraction module in use. Even when the query does not specify inline filters, the acceleration system can help narrow down and accelerate queries when there are multiple tags in a single well.

Most acceleration modules incur about a 1-1.5% storage overhead when using the bloom engine, but extremely low-throughput wells may consume more storage. If a well typically sees about 1-10 entries per second, acceleration may incur a 5-10% storage penalty, where a well with 10-15 thousand entries per second may see as little as 0.5% storage overhead. Gravwell accelerators also allow for user specified collision rate adjustments. If you can spare the storage, a lower collision rate may increase accuracy and speed up queries while increasing storage overhead. Reducing the accuracy reduces the storage penalty but decreases accuracy and reduces the effectiveness of the accelerator. The index engine will consume significantly more space depending on the number of fields extracted and the variability of the extracted data. For example, full text indexing may cause the accelerator files to consume as much space as the stored data files.
Most acceleration modules incur about a 1-1.5% storage overhead when using the bloom engine, but extremely low-throughput wells may consume more storage. If a well typically sees about 1-10 entries per second, acceleration may incur a 5-10% storage penalty, where a well with 10-15 thousand entries per second may see as little as 0.5% storage overhead. Gravwell accelerators also allow user-specified collision rate adjustments. If you can spare the storage, a lower collision rate may increase accuracy and speed up queries while increasing storage overhead. Reducing the accuracy reduces the storage penalty but decreases accuracy and reduces the effectiveness of the accelerator. The index engine will consume significantly more space depending on the number of fields extracted and the variability of the extracted data. For example, full text indexing may cause the accelerator files to consume as much space as the stored data files.

Accelerators must operate on the direct data portion of an entry (with the exception of the src accelerator which directly operates on the SRC field).

Expand Down Expand Up @@ -113,9 +113,9 @@ The json search module will transparently invoke the acceleration framework and
(accelerating_specific_tags)=
### Accelerating Specific Tags

The acceleration system allows for acceleration at the well or tag levels, this allows you to specify a basic acceleration scheme on a well then specify specific accelerator configurations for specific tags or groups of tags.
The acceleration system allows acceleration at the well or tag levels. This allows you to specify a basic acceleration scheme on a well then specify specific accelerator configurations for specific tags or groups of tags.

Per tag acceleration is enabled by including one or more `Tag-Accelerator-Definitions` in the `[global]` configuration block in your `gravwell.conf`. The `Tag-Accelerator-Definitions` configuration parameter should point to a file containing `Tag-Accelerator` blocks. The `Tag-Accelerator` blocks allow for specifying a set of tags and an accelerator configuration for those specific tags.
Per tag acceleration is enabled by including one or more `Tag-Accelerator-Definitions` in the `[global]` configuration block in your `gravwell.conf`. The `Tag-Accelerator-Definitions` configuration parameter should point to a file containing `Tag-Accelerator` blocks. The `Tag-Accelerator` blocks are used to specify a set of tags and an accelerator configuration for those specific tags.

For example, lets look at a definition where a well has a default Acceleration schema (or none at all) and several tags are singled out. In this example we are going to define two wells in addition to the default well. We will then include an accelerator definition file that will specify specific accelerators for tags.

Expand Down Expand Up @@ -259,7 +259,7 @@ While the fulltext accelerator may be the most flexible, it is also the most cos

### Fulltext Arguments

The fulltext accelerator supports a few options which allow for refining the types of data that is indexed and removing fields that incur significant storage overhead but may not help much at query time.
The fulltext accelerator supports a few options for refining the types of data that are indexed and removing fields that incur significant storage overhead but may not help much at query time.

| Argument | Description | Example | Default State |
|----------|-------------|---------|---------------|
Expand Down Expand Up @@ -444,7 +444,7 @@ The winlog accelerator is permissive ('-or' flag is implied). So specify any fi

## Netflow

The [netflow](/search/netflow/netflow) module allows for accelerating on netflow V5 fields and speeding up queries on large amounts of netflow data. While the netflow module is very fast and the data is extremely compact, it can still be beneficial to engage acceleration if you have very large netflow data volumes. The netflow module can use any of the direct netflow fields, but cannot use the pivot helper fields. This means that you must specify `Src` or `Dst` and not `IP`. The `IP` and `Port` fields cannot be specified in the acceleration arguments.
The [netflow](/search/netflow/netflow) module accelerates on Netflow V5 fields and speeding up queries on large amounts of netflow data. While the `netflow` module is very fast and the data is extremely compact, it can still be beneficial to engage acceleration if you have very large Netflow data volumes. The `netflow` module can use any of the direct Netflow fields, but cannot use the pivot helper fields. This means that you must specify `Src` or `Dst` and not `IP`. The `IP` and `Port` fields cannot be specified in the acceleration arguments.

```{note}
The helper extractions `Timestamp` and `Duration` cannot be used in accelerators.
Expand Down
2 changes: 1 addition & 1 deletion configuration/compression.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ A notable exception is data that will not compress much (if at all). In this sit

Gravwell supports two types of compression: default and transparent compression. Default compression uses the [snappy](https://en.wikipedia.org/wiki/Snappy_%28compression%29) compression system to perform compression and decompression in userspace. The default compression system is compatible with all filesystems. The transparent compression system uses the underlying filesystem to provide transparent block level compression.

Transparent compression allows for offloading compression/decompression work to the host kernel while maintaining an uncompressed page cache. Transparent compression can allow for very fast and efficient compression/decompression but requires that the underlying filesystem support transparent compression. Currently the [BTRFS](https://btrfs.wiki.kernel.org/index.php/Main_Page) and [ZFS](https://wiki.archlinux.org/index.php/ZFS) filesystem are supported.
Transparent compression offloads compression/decompression work to the host kernel while maintaining an uncompressed page cache. Transparent compression can provide very fast and efficient compression/decompression but requires that the underlying filesystem support transparent compression. Currently the [BTRFS](https://btrfs.wiki.kernel.org/index.php/Main_Page) and [ZFS](https://wiki.archlinux.org/index.php/ZFS) filesystem are supported.

```{attention}
Transparent compression has important implications for ageout rules involving total storage. Please refer to the [ageout documentation](ageout) for more information.
Expand Down
16 changes: 8 additions & 8 deletions configuration/parameters.md
Original file line number Diff line number Diff line change
Expand Up @@ -88,7 +88,7 @@ Description: Sets the path to the gravwell license file, the path must be
Applies to: Indexer and Webserver
Default Value: `/opt/gravwell/etc`
Example: `Config-Location=/tmp/path/to/etc`
Description: The config location allows for specifying an alternate location for housing all other configuration parameters. Specifying an alternate Config-Location allows for setting a single parameter without requiring that all other parameters be specified with the alternate path.
Description: The config location specifies an alternate location for housing all other configuration parameters, rather than requiring that all other parameters be specified with the alternate path.

### **Web-Port**
Applies to: Webserver
Expand Down Expand Up @@ -142,7 +142,7 @@ Description: The Datastore-Port parameter selects the port on which the datasto
Applies to: Webserver
Default Value:
Example: `Datastore=10.0.0.1:9405`
Description: The Datastore parameter specifies that the webserver should connect to a datastore to synchronize its dashboards, resources, user preferences, and search history. This allows for [distributed webservers](/distributed/frontend) but should only be set if needed. By default, webservers do not connect to a datastore.
Description: The Datastore parameter specifies that the webserver should connect to a datastore to synchronize its dashboards, resources, user preferences, and search history. This enables [distributed webservers](/distributed/frontend) but should only be set if needed. By default, webservers do not connect to a datastore.

### **Datastore-Update-Interval**
Applies to: Webserver
Expand Down Expand Up @@ -340,7 +340,7 @@ Description: The Saved-Store parameter specifies where saved searches wil
Applies to: Indexer and Webserver
Default Value: `2`
Example: `Search-Pipeline-Buffer-Size=8`
Description: The Search-Pipeline-Buffer-Size specifies how many blocks can be in transit between each module during a search. Larger sizes allow for better buffering and potentially higher throughput searches at the expense of resident memory usage. Indexers are more sensitive to the pipeline size, but also use a shared memory technique whereby the system can evict and re-instantiate memory at will; the webserver typically keeps all entries resident when moving through the pipeline and relies on condensing modules to reduce the memory load. If your system uses higher latency storage systems like spinning disks, it can be advantageous to increase this buffer size.
Description: The Search-Pipeline-Buffer-Size specifies how many blocks can be in transit between each module during a search. Larger sizes enable better buffering and potentially higher throughput searches at the expense of resident memory usage. Indexers are more sensitive to the pipeline size, but also use a shared memory technique whereby the system can evict and re-instantiate memory at will; the webserver typically keeps all entries resident when moving through the pipeline and relies on condensing modules to reduce the memory load. If your system uses higher latency storage systems like spinning disks, it can be advantageous to increase this buffer size.
Increasing this parameter may make searches perform better, but it will directly impact the number of running searches the system can handle at once! If you know you are storing extremely large entries like video frames, PE executables, or audio files you may need to reduce the buffer size to limit resident memory usage. If you see your host kernel invoking the Out Of Memory (OOM) firing and killing the Gravwell process, this is the first knob to turn.

### **Search-Relay-Buffer-Size**
Expand All @@ -365,13 +365,13 @@ Description: Some email providers have a maximum email size limit and will behav
Applies to: Webserver
Default Value: 0 (Disabled)
Example: 16
Description: Some HTTP proxies and/or load-balancers may fail when very large JSON payloads are relayed through them. This setting allows for setting an upper bound on the size of a single JSON payload when responding to an HTTP REST request. You probably do not want this and should not set it.
Description: Some HTTP proxies and/or load-balancers may fail when very large JSON payloads are relayed through them. This setting defines an upper bound on the size of a single JSON payload when responding to an HTTP REST request. You probably do not want this and should not set it.

### **Webserver-Access-Control-Allow-Origin**
Applies to: Webserver
Default Value: (empty)
Example: `google.com`
Description: This setting allows for manually specifying Allow-Origin HTTP headers on web requests. This can be useful if you are storing the Gravwell web application on a CDN and want to allow some files to be served by a different domain.
Description: This setting can be used to manually specify `Allow-Origin` HTTP headers on web requests. This can be useful if you are storing the Gravwell web application on a CDN and want to allow some files to be served by a different domain.

### **Enable-CBAC**
Applies to: Webserver
Expand All @@ -389,13 +389,13 @@ Description: The Gravwell indexers track each ingester that has successfully con
Applies to: Webserver
Default Value: false
Example: true
Description: This setting allows for maintaining the file access log but not ingesting it into the `gravwell` tag; this parameter is useful when you may want to keep an access log but do not want to clutter the `gravwell` tag with HTTP access logs.
Description: If enabled, the Gravwell webserver will maintain the access log (if enabled) but will not ingest the logs into the `gravwell` tag. This parameter is useful when you want to keep an access log but do not want to clutter the `gravwell` tag.

### **Tag-Accelerator-Definitions**
Applies to: Webserver
Default Value: (empty)
Example: `/opt/gravwell/etc/default.defs`
Description: Gravwell allows for creating per-tag accelerator definitions so that you can finely tune your acceleration behavior when ingesting extremely large data sets. This configuration parameter can be specified multiple times so that you can load accelerator definitions from multiple files. See the [Accelerators](/configuration/accelerators) section for more information.
Description: This parameter specifies a file containing per-tag accelerator definitions, so that you can fine-tune your acceleration behavior when ingesting extremely large data sets. This configuration parameter can be specified multiple times so that you can load accelerator definitions from multiple files. See the [Accelerators](/configuration/accelerators) section for more information.

### **Max-Search-History**
Applies to: Webserver
Expand Down Expand Up @@ -643,7 +643,7 @@ Description: Gravwell can automatically lock user accounts on multiple successiv
Applies to: Webserver
Default Value: https://kits.gravwell.io/kits
Example: `Gravwell-Kit-Server=http://internal.mycompany.io/gravwell/kits`
Description: Allows for overriding the Gravwell kitserver host, this can be useful in airgapped or segmented deployments where you host a mirror of the Gravwell kitserver. Set this value to an empty string to completely disable access to the remote kitserver.
Description: Overrides the Gravwell kitserver host. This can be useful in airgapped or segmented deployments where you host a mirror of the Gravwell kitserver. Set this value to an empty string to completely disable access to the remote kitserver.
Example:
```
Gravwell-Kit-Server="" #disable remote access to gravwell kitserver
Expand Down
Loading

0 comments on commit 9e47995

Please sign in to comment.