Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add File-Filters language to sqs/sns listeners as well. Slight reorganization #884

Merged
merged 1 commit into from
Jan 19, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
45 changes: 23 additions & 22 deletions ingesters/s3.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,28 +59,6 @@ The `Bucket-ARN` configuration parameter wants a fully qualified ARN value, not

![](ARN.png)

### Object Match Globs

The S3 ingester will attempt to ingest all objects in an S3 bucket recursively unless one or more `File-Filters` patterns are used to establish which objects should be consumed. The `File-Filters` patterns support the standard globbing patterns for filenames and "double-star" patterns for recursive directory matches.

For example, if we specify a single `File-Filters` pattern of `*.log` then the ingester will consume all objects that have the file extension `.log` at the first directory level only. An object named `foo.log` will be consumed but an object named `stuff/foo.log` will not.

Multiple `File-Filters` can be specified to create an OR match pattern; for example the following set of patterns would match all objects that have an extension of `.log` and are located in either the `this`, `that`, or `theother` directories:

```
File-Filters=this/*.log
File-Filters=that/*.log
File-Filters=theother/*.log
```

Arbitrary directory specifications can be achieved using a "double-star" globbing pattern. For example `File-Filters="AWSLogs/**/*.json.gz"` will match all objects with a file extension of `.json.gz` located in any sub-directory within the `AWSLogs` top level directory. All of the following objects would be matched by this filter:

```
AWSLogs/475058115300/CloudTrail/us-west-2/2022/11/28/475058115300_CloudTrail_us-west-2_20221128T2320Z_gvNAnhYNeqzmI2bH.json.gz
AWSLogs/475058115300/CloudTrail/us-east-1/2022/11/28/475058115300_CloudTrail_us-east-1_20221128T2320Z_lY7sQmelLGP14BrY.json.gz
AWSLogs/summary/CloudTrail_us-east-1.json.gz
```

### Bucket Data Formats

By default the S3 ingester will process objects using a line reader, essentially expecting line delimited data. However, the ingester can also natively consume AWS Cloudtrail event records in JSON format. If no `Reader` is specified for a bucket, `line` is assumed. The Following options are available for the `Reader` configuration parameter:
Expand All @@ -107,6 +85,29 @@ The S3 Ingester can also pull S3 objects that are referenced in SQS queue messag
| Timezone-Override | NO | | Force a specific timezone when interpreting timestamps in data |
| Timestamp-Format-Override | NO | | Force the ingester to look for a specific timestamp format |
| Credentials-Type | NO | static | Sets the type of authentication credentials used for accessing the SQS and S3 services. |
| File-Filters | NO | | Specify one or more glob patterns for use when matching object names (Example: `AWSLogs/**/*.json.gz`) |

### Object Match Globs

The S3 ingester will attempt to ingest all objects in an S3 bucket or SQS/SNS message unless one or more `File-Filters` patterns are used to establish which objects should be consumed. The `File-Filters` patterns support the standard globbing patterns for filenames and "double-star" patterns for recursive directory matches.

For example, if we specify a single `File-Filters` pattern of `*.log` then the ingester will consume all objects that have the file extension `.log` at the first directory level only. An object named `foo.log` will be consumed but an object named `stuff/foo.log` will not.

Multiple `File-Filters` can be specified to create an OR match pattern; for example the following set of patterns would match all objects that have an extension of `.log` and are located in either the `this`, `that`, or `theother` directories:

```
File-Filters=this/*.log
File-Filters=that/*.log
File-Filters=theother/*.log
```

Arbitrary directory specifications can be achieved using a "double-star" globbing pattern. For example `File-Filters="AWSLogs/**/*.json.gz"` will match all objects with a file extension of `.json.gz` located in any sub-directory within the `AWSLogs` top level directory. All of the following objects would be matched by this filter:

```
AWSLogs/475058115300/CloudTrail/us-west-2/2022/11/28/475058115300_CloudTrail_us-west-2_20221128T2320Z_gvNAnhYNeqzmI2bH.json.gz
AWSLogs/475058115300/CloudTrail/us-east-1/2022/11/28/475058115300_CloudTrail_us-east-1_20221128T2320Z_lY7sQmelLGP14BrY.json.gz
AWSLogs/summary/CloudTrail_us-east-1.json.gz
```

## Credentials-Type Authentication Options

Expand Down