Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow configuration to specify the host name of the log. #141

Merged
merged 2 commits into from
Feb 18, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
73 changes: 49 additions & 24 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,9 @@
# log4j-s3-search

## IMPORTANT NOTE on log4j vulnerabilty: https://www.cisa.gov/news/2021/12/11/statement-cisa-director-easterly-log4j-vulnerability

* Since release **3.6.0**, log4j-s3-search is built with **log4j2 2.17.1**, addressing recent vulnerabilities (see above). You are **strongly advised** to also switch to Log4j2 2.17.1 (**or [higher](https://mvnrepository.com/artifact/org.apache.logging.log4j/log4j-core)**, since I'm tired of updating this) for your applications.
* If you're still using Log4j 1,x, **PLEASE consider upgrading to Log4j 2.x**. Log4j 1.x is deprecated, and _there are vulnerabilities with it that nobody will fix_. ~Once I get around to it, I may even drop~ As of release 4.0.0, I have removed **appender-log4j** from this repo.

*If you REALLY need to continue using Log4j, you may use release **3.7.0**. But really: upgrade to Log4j2 for your own sake.*
* Log4j 1.x is no longer supported. If you're still using Log4j 1.x, **PLEASE consider upgrading to Log4j 2.x**. Log4j
1.x is deprecated, and _there are vulnerabilities with it that nobody will fix_.
* If you REALLY need to continue using Log4j, you may use release **3.7.0**. But really: upgrade to Log4j2 for your
own sake.

![image](https://user-images.githubusercontent.com/1897208/155896919-552ab47e-98c9-4d54-9878-d0e145bb7153.png)

Expand Down Expand Up @@ -77,14 +75,26 @@ programs using this library for both Log4j and Log4j2.

## Configuration
### General
In addition to the typical appender configuration (such as layout, Threshold, etc.), these common properties control the appender in general:
In addition to the typical appender configuration (such as layout, Threshold, etc.), these common properties control the
appender in general:
* **stagingBufferSize** -- the number of entries to collect for a batch before publishing (default is 2000).
* **stagingBufferAge** -- (optional) if specified, the number of *minutes* to wait before publishing a batch. If used,
this parameter will override the condition set by *stagingBufferSize*. The value must be >= 1.
* **tags** -- comma-separated tokens to associate to the log entries (used mainly for search filtering). Examples:
* **stagingBufferAge** -- (_optional_) if specified, the number of *minutes* to wait before publishing a batch. If
used, this parameter will override the condition set by *stagingBufferSize*. The value must be >= 1.
* **hostName** -- (_optional_) a string to use to indicate where this log comes from. If this is not configured, by
default it uses the host name where the logger is run. When set, this cannot be a blank string, or it will be
ignored.
* **tags** -- (_optional_) comma-separated tokens to associate to the log entries (used mainly for search filtering).
Examples:
* `production,webserver`
* `qa,database`

DO NOT specify both **stagingBufferSize** and **stagingBufferAge**. Choose the option that works best for you. Because
there is some overhead on preparing and upload of logs, if you specify too small a value for these parameters, the
logger may not have enough time to do its work and eventually will cause your process to fail.

How small is too small? It really depends on how often your program logs. In general, I would suggest a **minimum** of
500 for **stagingBufferSize** and 60 seconds for **stagingBufferAge**.

A sample snippet from a sample `log4j2.xml` to publish whenever 10 events are collected:
```
<Configuration status="INFO">
Expand Down Expand Up @@ -134,20 +144,23 @@ Use either:

but not all three simultaneously. You will get an error from AWS if you use all three.

* **s3PathStyleAccess** -- "true" to use the older Path Style Access/URL when contacting S3 (see https://docs.aws.amazon.com/AmazonS3/latest/userguide/VirtualHosting.html#path-style-access)
* **s3PathStyleAccess** -- "true" to use the older Path Style Access/URL when contacting S3 (see
https://docs.aws.amazon.com/AmazonS3/latest/userguide/VirtualHosting.html#path-style-access)

AWS credentials are required to interact with S3. **NOTE** that the recommended way of configuring
the credentials is:
1) using roles assigned to instance profiles (when working with EC2 instances) or
2) creating a credentials file on the computer running the program as
`%USERPROFILE%\.aws\credentials` (Windows) or `~/.aws/credentials` (see https://docs.aws.amazon.com/sdk-for-java/v2/developer-guide/credentials.html#credentials-file-format)
`%USERPROFILE%\.aws\credentials` (Windows) or `~/.aws/credentials` (see
https://docs.aws.amazon.com/sdk-for-java/v2/developer-guide/credentials.html#credentials-file-format)

If the above methods are not possible for your situation, these properties can also be overridden in
the optional Log4j configuration:
* **s3AwsKey** and **s3AwsSecret** -- access and secret keys.
* **s3AwsSessionToken** -- session token for short-lived credentials.

When these properties are present in the configuration, they *take precedence over* the default sources in the credential chain as described earlier.
When these properties are present in the configuration, they *take precedence over* the default sources in the
credential chain as described earlier.

A sample snippet (with the optional s3AwsKey and s3AwsSecret properties set):
```
Expand All @@ -174,17 +187,22 @@ logs/myApplication/20150327081000_localhost_6187f4043f2449ccb4cbd3a7930d1130

Content configurations
* **s3Compression** -- if set to "true," then contents will be GZIP'ed before publishing into S3
* **s3KeyGzSuffixEnabled** -- if set to "true," then the s3 key will have a `.gz` suffix when `s3Compression` is enabled. (If `s3Compression` is not "true," this is ignored.)
* **s3SseKeyType** -- if set to "SSE_S3," then contents published will be flagged to use SSE-S3 encryption (see https://docs.aws.amazon.com/AmazonS3/latest/dev/UsingServerSideEncryption.html)
* **s3StorageClass** -- the S3 storage class associated with sent objects (e.g. "standard", "glacier"), if not set then "standard" storage class will be used as default (see https://docs.aws.amazon.com/AmazonS3/latest/userguide/storage-class-intro.html)
* **s3KeyGzSuffixEnabled** -- if set to "true," then the s3 key will have a `.gz` suffix when `s3Compression` is
enabled. (If `s3Compression` is not "true," this is ignored.)
* **s3SseKeyType** -- if set to "SSE_S3," then contents published will be flagged to use SSE-S3 encryption (see
https://docs.aws.amazon.com/AmazonS3/latest/dev/UsingServerSideEncryption.html)
* **s3StorageClass** -- the S3 storage class associated with sent objects (e.g. "standard", "glacier"), if not set then
"standard" storage class will be used as default (see
https://docs.aws.amazon.com/AmazonS3/latest/userguide/storage-class-intro.html)

### Azure Blob
These properties (**please use your own values**) control how the logs will be stored in Azure Blob Storage:
* **azureBlobContainer** -- the storage container name.
* **azureBlobNamePrefix** -- the prefix for the blob name.
* **azureBlobCompressionEnabled** -- if set to "true," then contents will be GZIP'ed before publishing.
* **azureStorageConnectionString** -- optional value for the connection string for connecting to Azure. See note below.
* **azureBlobNameGzSuffixEnabled** -- if set to "true," then the blob name will have a `.gz` suffix when `azureBlobCompressionEnabled` is enabled. (If `azureBlobCompressionEnabled` is not "true," this is ignored.)
* **azureBlobNameGzSuffixEnabled** -- if set to "true," then the blob name will have a `.gz` suffix when
`azureBlobCompressionEnabled` is enabled. (If `azureBlobCompressionEnabled` is not "true," this is ignored.)

```
<Configuration status="INFO">
Expand All @@ -209,7 +227,8 @@ logs/myApplication/20150327081000_localhost_6187f4043f2449ccb4cbd3a7930d1130
```

Notes:
* See https://docs.microsoft.com/en-us/rest/api/storageservices/Naming-and-Referencing-Containers--Blobs--and-Metadata for rules on names.
* See https://docs.microsoft.com/en-us/rest/api/storageservices/Naming-and-Referencing-Containers--Blobs--and-Metadata
for rules on names.
* From various examples online, the preferred way to establish the Azure connection string is to set the environment
variable `AZURE_STORAGE_CONNECTION_STRING` on the hosts running your code.
However, you can also set the `azureStorageConnectionString` property for local testing.
Expand All @@ -224,7 +243,8 @@ These properties (**please use your own values**) control how the logs will be s
* **gcpStorageBlobNamePrefix** -- the prefix for the blob name.
* **gcpStorageCompressionEnabled** -- if set to "true," then contents will be GZIP'ed before publishing.
The default is "false."
* **gcpStorageBlobNameGzSuffixEnabled** -- if set to "true," then the blob name will have a `.gz` suffix when `gcpStorageCompressionEnabled` is enabled. (If `gcpStorageCompressionEnabled` is not "true," this is ignored.)
* **gcpStorageBlobNameGzSuffixEnabled** -- if set to "true," then the blob name will have a `.gz` suffix when
`gcpStorageCompressionEnabled` is enabled. (If `gcpStorageCompressionEnabled` is not "true," this is ignored.)


Just as in the case with AWS S3, there is an [extensive authentication process](https://github.com/googleapis/google-cloud-java#authentication) and list of options.
Expand Down Expand Up @@ -311,7 +331,8 @@ There are four properties for Elasticsearch, all but one are optional:

The scheme/protocol is `http://` by default, but you can override this by
explicitly including it in the value (e.g. `https://localhost:9200`).
* **elasticSearchPublishHelperClass** -- optional fully-qualified name of the class (on the runtime classpath, of course) implementing `IElasticsearchPublishHelper` that will perform publishing to Elasticsearch
* **elasticSearchPublishHelperClass** -- optional fully-qualified name of the class (on the runtime classpath, of
course) implementing `IElasticsearchPublishHelper` that will perform publishing to Elasticsearch

```
<Configuration status="INFO">
Expand All @@ -325,7 +346,9 @@ There are four properties for Elasticsearch, all but one are optional:
```

## Solr Integration
A new core should be created for the log events. The setting up of Apache Solr and the setting up of a core are outside the scope of this file. However, a sample template for a `schema.xml` that can be used is included in this repo as `/misc/solr/schema.xml`.
A new core should be created for the log events. The setting up of Apache Solr and the setting up of a core are
outside the scope of this file. However, a sample template for a `schema.xml` that can be used is included in this
repo as `/misc/solr/schema.xml`.

Each log event will be indexed as a Solr document. The "id" property for each document
will follow the format:
Expand All @@ -346,9 +369,11 @@ String s3Key = id.substring(0, id.indexOf("-"));
```

## Elasticsearch Integration
A new index should be created for the log events. The setting up of Elasticsearch and the index are outside the scope of this file. However, a sample template for the index schema that can be used is included in this repo as `/misc/elasticsearch/logindex.json`.
This schema should be installed before any log entries are added. A typical PUT to `/<elasticsearch host>:9200/<index>` with
the body of the JSON should be sufficient.
A new index should be created for the log events. The setting up of Elasticsearch and the index are outside the scope
of this file. However, a sample template for the index schema that can be used is included in this repo as
`/misc/elasticsearch/logindex.json`.
This schema should be installed before any log entries are added. A typical PUT to `/<elasticsearch host>:9200/<index>`
with the body of the JSON should be sufficient.

Each log event will be indexed as a document. The "id" property for each document
will follow the format:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -34,11 +34,14 @@ public class Log4j2AppenderBuilder
private String tags;

@PluginBuilderAttribute
private int stagingBufferSize = 25;
private int stagingBufferSize = 2000;

@PluginBuilderAttribute
private int stagingBufferAge = 0;

@PluginBuilderAttribute
private String hostName;

// S3 properties
@PluginBuilderAttribute
private String s3Bucket;
Expand Down Expand Up @@ -262,10 +265,13 @@ Optional<ElasticsearchConfiguration> getElasticsearchConfigIfEnabled() {
}

IBufferPublisher createCachePublisher() throws UnknownHostException {

java.net.InetAddress addr = java.net.InetAddress.getLocalHost();
String hostName = addr.getHostName();
BufferPublisher publisher = new BufferPublisher(hostName, parseTags(tags));
// Use the configured host name if any.
String hostNameForPublisher = this.hostName;
if (!StringUtils.isTruthy(hostNameForPublisher)) {
java.net.InetAddress addr = java.net.InetAddress.getLocalHost();
hostNameForPublisher = addr.getHostName();
}
BufferPublisher publisher = new BufferPublisher(hostNameForPublisher, parseTags(tags));
PatternedPathAdjuster pathAdjuster = new PatternedPathAdjuster();

getS3ConfigIfEnabled().ifPresent(config -> {
Expand Down
Original file line number Diff line number Diff line change
@@ -1,12 +1,14 @@
package com.van.logging.log4j2;

import com.van.logging.IBufferPublisher;
import com.van.logging.PublishContext;
import com.van.logging.aws.S3Configuration;
import junit.framework.TestCase;

import java.lang.reflect.Field;
import java.net.UnknownHostException;
import java.util.Optional;

import static org.junit.Assert.assertTrue;

public class Log4j2AppenderBuilderTest extends TestCase {

Expand Down Expand Up @@ -58,4 +60,47 @@ public void testS3CompressionSettingDisabled() {
assertFalse(s3Configuration.isCompressionEnabled());
}));
}

public void testCreatePublisherWithHostNameConfig() {
Log4j2AppenderBuilder builder = new Log4j2AppenderBuilder();
Class<? extends Log4j2AppenderBuilder> clz = builder.getClass();
String testHostName = "TESTHOSTNAME";

try {
Field field = clz.getDeclaredField("hostName");
field.setAccessible(true);
field.set(builder, testHostName);
} catch(ReflectiveOperationException e) {
fail(e.getMessage());
}
PublishContext context = null;
try {
IBufferPublisher publisher = builder.createCachePublisher();
context = publisher.startPublish("CACHENAME");
} catch (UnknownHostException e) {
fail(e.getMessage());
}
assertEquals(testHostName, context.getHostName());
}

public void testCreatePublisherWithDefaultHostName() {
Log4j2AppenderBuilder builder = new Log4j2AppenderBuilder();
PublishContext context = null;

java.net.InetAddress addr = null;
try {
addr = java.net.InetAddress.getLocalHost();
} catch (UnknownHostException e) {
fail(e.getMessage());
}
String hostName = addr.getHostName();

try {
IBufferPublisher publisher = builder.createCachePublisher();
context = publisher.startPublish("CACHENAME");
} catch (UnknownHostException e) {
fail(e.getMessage());
}
assertEquals(hostName, context.getHostName());
}
}
Loading