Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Grove daemon mode (groved) #67

Open
wants to merge 18 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 32 additions & 0 deletions .github/workflows/release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,8 @@ concurrency:
permissions:
pages: write
contents: read
packages: write
attestations: write
id-token: write

jobs:
Expand Down Expand Up @@ -43,6 +45,36 @@ jobs:
with:
password: ${{ secrets.PYPI_API_TOKEN }}

- name: Authenticate with the container registry
uses: docker/login-action@v2
with:
registry: ${{ env.REGISTRY }}
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}

- name: Extract container metadata (tags, labels)
id: meta
uses: docker/metadata-action@v4
with:
images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}

- name: Build and push container image
uses: docker/build-push-action@v3
with:
context: .
push: true
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}
build-args: |
VERSION=${{ github.ref_name }}

- name: Generate artifact attestation
uses: actions/attest-build-provenance@v1
with:
subject-name: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
subject-digest: ${{ steps.push.outputs.digest }}
push-to-registry: true

# Finally, generate and publish documentation after a successful release.
documentation:
needs: release
Expand Down
2 changes: 0 additions & 2 deletions .vscode/settings.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,4 @@
{
"python.linting.enabled": true,
"python.formatting.provider": "none",
"[python]": {
"editor.formatOnSave": true,
"editor.codeActionsOnSave": {
Expand Down
9 changes: 9 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,15 @@

FROM python:3.12-slim

# Allow build-time specification of version.
ARG VERSION

# Keep things friendly.
LABEL org.opencontainers.image.title="Grove"
LABEL org.opencontainers.image.description="A Software as a Service (SaaS) log collection framework."
LABEL org.opencontainers.image.url="https://github.com/hashicorp-forge/grove"
LABEL org.opencontainers.image.version=$VERSION

# Copy in Grove ready for installation.
WORKDIR /tmp/grove
COPY grove grove/
Expand Down
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,7 @@ Currently the following log sources are supported by Grove out of the box. If a
isn't listed here, support can be added by creating a custom connector!

* Atlassian audit events (e.g. Confluence, Jira)
* FleetDM host logs
* GitHub audit logs
* GSuite alerts
* GSuite activity logs
Expand Down
23 changes: 23 additions & 0 deletions docs/configuration.rst
Original file line number Diff line number Diff line change
Expand Up @@ -102,6 +102,29 @@ used to assist with encoding (:code:`encoding`), disabling a connector

Please see the :meth:`grove.models.ConnectorConfig` implementation for more details.

Optional Fields
^^^^^^^^^^^^^^^

As specified in :meth:`grove.models.ConnectorConfig`, the following fields are optional.
This is not an exhaustive list, but only covers important fields which are noteworthy.

* :code:`frequency`

* The frequency on which a connector should be executed, in seconds.
* If not specified, this defaults to 600 seconds.

* :code:`processors`

* Defines a list of processors which should be run.
* Processors enable transformation of collected log records prior to output.
* See the :doc:`processors` section of the documentation for more information.

.. Warning::
:code:`frequency` is still adhered to when running in scheduled mode. If Grove is
executed more frequently than the specified :code:`frequency`, it will not execute
until enough time has passed since the last execution. This is done to ensure
consistency between daemon mode, and scheduled mode.

.. _secrets:

Secrets
Expand Down
2 changes: 2 additions & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,9 @@ isn't listed here, support can be added by creating a custom connector!
:caption: Getting Started

quickstart
scheduling
configuration
processors
examples
faq

Expand Down
185 changes: 185 additions & 0 deletions docs/processors.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,185 @@
Processors
===========

Processors provide an optional facility to allow transformation of collected log records
before output. Processors are defined as part of a connector configuration document, and
are able to be chained together in order to perform a particular set of operations in
sequence.

A full list of available processors can be found in the submodules section of the
:meth:`grove.processors` documentation.

.. note::
Custom processors can be created in the same way as plugins. This can assist when
performing specific processing of log records not already supported by the built-in
Grove processors. For more information, please see the :doc:`internals` section of
this documentation.

Configuration
^^^^^^^^^^^^^

Processors are configured in the `processors` list inside of a connector configuration
document. This list should contain each processor which is required to be run, in the
desired order.

Each processor requires, at a minimum, that a :code:`name` and a :code:`processor` field
are defined. However, each processor have their own set of configuration fields which
are used to define how the processor should operate on a log record.

To understand exactly which processor requires which fields, please refer to the
relevant :meth:`grove.processors` documentation.


Example
^^^^^^^

As an example of using processors together in order to transform collected log records,
the following example flattens Google Workspace activity logs, and ensures that there is
only one event per log record:

.. code-block:: json

"processors": [
{
"name": "One event per log entry",
"processor": "split_path",
"source": "events"
},
{
"name": "Flatten and zip event parameters",
"processor": "zip_paths",
"source": "events.parameters",
"key": "name",
"values": [
"value",
"intValue",
"boolValue",
"multiValue",
"multiIntValue",
"multiBoolValue"
]
}
]

In this example, two processors are in use: `split_path`, and `zip_paths`.

In order to demonstrate the operations that these processors have on a log record, the
following section provides sample log records before and after processing by a given
processor.

split_path
~~~~~~~~~

Split path is useful for upstream services which aggregate multiple events into a single
log record. In these cases, a single log record returned by a service may have multiple
events within it - rather than event one per log record. This can result in complexity
when attempting to parse and index these records in downstream log platforms.

In order to handle this, the :code:`split_path` processor generates new log records for
each event, cloning the rest of the log record. As an example, the :code:`split_path`
processor configuration defined in the section above when working on the following log
record:

.. code-block:: json

{
"id": "00001",
"events": [
{
"operation": "create",
"parameters": [
{"name": "username", "value": "example"},
{"name": "ip", "value": "192.0.2.1"}
]
},
{
"operation": "update",
"parameters": [
{"name": "username", "value": "example"},
{"name": "ip", "value": "192.0.2.1"}
]
}
]
}

Would instead be output as two log records with the following structure:

.. code-block:: json

{
"id": "00001",
"events": {
"operation": "create",
"parameters": [
{"name": "username", "value": "example"},
{"name": "ip", "value": "192.0.2.1"}
]
}
},
{
"id": "00001",
"events": {
"operation": "update",
"parameters": [
{"name": "username", "value": "example"},
{"name": "ip", "value": "192.0.2.1"}
]
}
}

zip_paths
~~~~~~~~~

Continuing from the example configuration and log record above, Zip Paths can be used to
extract "generic" key / value pairs back into fields with their respective names.

As an example, the :code:`zip_paths` processor configuration defined in the section
above when working on the log records output from the :code:`spit_path` example above:

.. code-block:: json

{
"id": "00001",
"events": {
"operation": "create",
"parameters": [
{"name": "username", "value": "example"},
{"name": "ip", "value": "192.0.2.1"}
]
}
},
{
"id": "00001",
"events": {
"operation": "update",
"parameters": [
{"name": "username", "value": "example"},
{"name": "ip", "value": "192.0.2.1"}
]
}
}

Would output the following log records:

.. code-block:: json

{
"id": "00001",
"events": {
"operation": "create",
"parameters": {
"username": "example",
"ip": "192.0.2.1"
}
}
},
{
"id": "00001",
"events": {
"operation": "update",
"parameters": {
"username": "example",
"ip": "192.0.2.1"
}
}
}
56 changes: 56 additions & 0 deletions docs/scheduling.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
Scheduling
==========

In most cases, for Grove to be effective, it must be configured to run on a particular
interval to ensure that new logs from configured sources are collected. This periodic
collection is enabled by one of Grove's runtime modes. The two modes currently provided
by Grove are:

1. Scheduled mode.
* This is run by using the :code:`grove` command.
* This executes all configured connectors once, and then exits.
* This mode is intended to be used conjunction with an external scheduler, or to
allow a single point-in-time collection of logs for investigation and incident
response.
2. Daemon mode.
* This is run using the :code:`groved` command.
* This is a long running process which periodically executes all configured
connectors at their configured :code:`frequency`.
* This mode is intended to be run as a system service, or in a container runtime.

Scheduled mode
-------------

"Scheduled" mode is executed using the :code:`grove` command.

Scheduled mode has no mode specific configuration option(s) which affect its runtime.

Daemon mode
-----------

Daemon mode is executed using the :code:`groved` command - rather than :code:`grove`.

In Daemon mode, Grove runs as a long-running process which executes connectors on their
configured frequency. This enables connectors to run until completion with no deadlines,
and allows each connector to be executed at a different frequency - which may be
important for certain types of connector which need to collect data more frequently than
others.

In daemon mode Grove has one important mode specific configuration option. As usual,
this is configurable using an environment variable using the same name.

* :code:`GROVE_CONFIG_REFRESH`
* This option controls how frequently Grove will refresh these connector
configuration documents from the configured backend.
* Grove keeps a copy of all connector configuration documents in memory to prevent
querying the configuration backend constantly in the event loop.
* This allows connector configuration documents to be added, removed, and modified
without the need to restart Grove.
* This option defaults to 300 seconds.

.. Note::
It is important to note that while connector configuration documents are kept in
memory and periodically refreshed, secrets are fetched every time a connector is
executed - if a secrets backend is also in use. This is done to enable the use of
dynamic secrets engines, if supported by the configured secrets backend, and to
allow for secrets to be rotated without Grove needing to be notified or updated.
4 changes: 2 additions & 2 deletions grove/__about__.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
"""Grove metadata."""

__version__ = "1.6.0"
__version__ = "2.0.0"
__title__ = "grove"
__license__ = "Mozilla Public License 2.0"
__copyright__ = "Copyright 2023 HashiCorp, Inc."
__copyright__ = "Copyright 2025 HashiCorp, Inc."
Loading
Loading