Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Amazon Security Lake integration - Logstash #135

Closed
Tracked by #128
AlexRuiz7 opened this issue Jan 19, 2024 · 4 comments · Fixed by #143
Closed
Tracked by #128

Amazon Security Lake integration - Logstash #135

AlexRuiz7 opened this issue Jan 19, 2024 · 4 comments · Fixed by #143
Assignees
Labels
level/task Task issue type/research Research issue

Comments

@AlexRuiz7
Copy link
Member

Description

Wazuh's Amazon Security Lake integration as source will use Logstash as a data forwarder. The data has to be forwarded from Wazuh's indices to an Amazon S3 bucket. Logstash provide input and output plugins that will allow us to do that.

Tasks

  • Implement a Logstash pipeline to send events from an index to an S3 bucket.
  • Document the process for reproducibility.
@AlexRuiz7 AlexRuiz7 added level/task Task issue type/research Research issue labels Jan 19, 2024
@AlexRuiz7 AlexRuiz7 self-assigned this Jan 19, 2024
@wazuhci wazuhci moved this to In progress in Release 4.9.0 Jan 19, 2024
@AlexRuiz7
Copy link
Member Author

AlexRuiz7 commented Jan 19, 2024

Follow the Wazuh indexer integration using Logstash to install Logstash and the logstash-input-opensearch plugin.

RPM: https://www.elastic.co/guide/en/logstash/current/installing-logstash.html#_yum

# Install Logstash
sudo rpm --import https://artifacts.elastic.co/GPG-KEY-elasticsearch
echo "[logstash-8.x]" >> /etc/yum.repos.d/logstash.repo
echo "name=Elastic repository for 8.x packages" >> /etc/yum.repos.d/logstash.repo
echo "baseurl=https://artifacts.elastic.co/packages/8.x/yum" >> /etc/yum.repos.d/logstash.repo
echo "gpgcheck=1" >> /etc/yum.repos.d/logstash.repo
echo "gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch" >> /etc/yum.repos.d/logstash.repo
echo "enabled=1" >> /etc/yum.repos.d/logstash.repo
echo "autorefresh=1" >> /etc/yum.repos.d/logstash.repo
echo "type=rpm-md" >> /etc/yum.repos.d/logstash.repo
sudo yum install logstash

# Install plugins (logstash-output-s3 is already installed)
sudo /usr/share/logstash/bin/logstash-plugin install logstash-input-opensearch # logstash-output-s3

# Copy certificates
mkdir -p /etc/logstash/wi-certs/
cp /etc/wazuh-indexer/certs/root-ca.pem /etc/logstash/wi-certs/root-ca.pem
chown logstash:logstash /etc/logstash/wi-certs/root-ca.pem

# Configuring new indexes
SKIP

# Configuring a pipeline

# Keystore
## Prepare keystore
set +o history
echo 'LOGSTASH_KEYSTORE_PASS="123456"'| sudo tee /etc/sysconfig/logstash
export LOGSTASH_KEYSTORE_PASS=123456
set -o history
sudo chown root /etc/sysconfig/logstash
sudo chmod 600 /etc/sysconfig/logstash
sudo systemctl start logstash

## Create keystore
sudo -E /usr/share/logstash/bin/logstash-keystore --path.settings /etc/logstash create

## Store Wazuh indexer credentials (admin user)
sudo -E /usr/share/logstash/bin/logstash-keystore --path.settings /etc/logstash add WAZUH_INDEXER_USERNAME
sudo -E /usr/share/logstash/bin/logstash-keystore --path.settings /etc/logstash add WAZUH_INDEXER_PASSWORD

# Pipeline
sudo touch /etc/logstash/conf.d/wazuh-s3.conf
# Replace with cp /vagrant/wazuh-s3.conf /etc/logstash/conf.d/wazuh-s3.conf
sudo systemctl stop logstash
sudo -E /usr/share/logstash/bin/logstash -f /etc/logstash/conf.d/wazuh-s3.conf --path.settings /etc/logstash/
    |- Success: `[INFO ][logstash.agent           ] Pipelines running ...`

# Start Logstash
sudo systemctl enable logstash
sudo systemctl start logstash

Output

[root@rhel7 vagrant]# sudo -E /usr/share/logstash/bin/logstash -f /etc/logstash/conf.d/wazuh-s3.conf --path.settings /etc/logstash/
Using bundled JDK: /usr/share/logstash/jdk
/usr/share/logstash/vendor/bundle/jruby/3.1.0/gems/concurrent-ruby-1.1.9/lib/concurrent-ruby/concurrent/executor/java_thread_pool_executor.rb:13: warning: method redefined; discarding old to_int
/usr/share/logstash/vendor/bundle/jruby/3.1.0/gems/concurrent-ruby-1.1.9/lib/concurrent-ruby/concurrent/executor/java_thread_pool_executor.rb:13: warning: method redefined; discarding old to_f
Sending Logstash logs to /var/log/logstash which is now configured via log4j2.properties
[2024-01-25T16:45:01,461][INFO ][logstash.runner          ] Log4j configuration path used is: /etc/logstash/log4j2.properties
[2024-01-25T16:45:01,462][INFO ][logstash.runner          ] Starting Logstash {"logstash.version"=>"8.12.0", "jruby.version"=>"jruby 9.4.5.0 (3.1.4) 2023-11-02 1abae2700f OpenJDK 64-Bit Server VM 17.0.9+9 on 17.0.9+9 +indy +jit [x86_64-linux]"}
[2024-01-25T16:45:01,464][INFO ][logstash.runner          ] JVM bootstrap flags: [-Xms1g, -Xmx1g, -Djava.awt.headless=true, -Dfile.encoding=UTF-8, -Djruby.compile.invokedynamic=true, -XX:+HeapDumpOnOutOfMemoryError, -Djava.security.egd=file:/dev/urandom, -Dlog4j2.isThreadContextMapInheritable=true, -Dlogstash.jackson.stream-read-constraints.max-string-length=200000000, -Dlogstash.jackson.stream-read-constraints.max-number-length=10000, -Djruby.regexp.interruptible=true, -Djdk.io.File.enableADS=true, --add-exports=jdk.compiler/com.sun.tools.javac.api=ALL-UNNAMED, --add-exports=jdk.compiler/com.sun.tools.javac.file=ALL-UNNAMED, --add-exports=jdk.compiler/com.sun.tools.javac.parser=ALL-UNNAMED, --add-exports=jdk.compiler/com.sun.tools.javac.tree=ALL-UNNAMED, --add-exports=jdk.compiler/com.sun.tools.javac.util=ALL-UNNAMED, --add-opens=java.base/java.security=ALL-UNNAMED, --add-opens=java.base/java.io=ALL-UNNAMED, --add-opens=java.base/java.nio.channels=ALL-UNNAMED, --add-opens=java.base/sun.nio.ch=ALL-UNNAMED, --add-opens=java.management/sun.management=ALL-UNNAMED]
[2024-01-25T16:45:01,465][INFO ][logstash.runner          ] Jackson default value override `logstash.jackson.stream-read-constraints.max-string-length` configured to `200000000`
[2024-01-25T16:45:01,465][INFO ][logstash.runner          ] Jackson default value override `logstash.jackson.stream-read-constraints.max-number-length` configured to `10000`
[2024-01-25T16:45:01,611][WARN ][logstash.config.source.multilocal] Ignoring the 'pipelines.yml' file because modules or command line options are specified
[2024-01-25T16:45:02,107][INFO ][logstash.agent           ] Successfully started Logstash API endpoint {:port=>9600, :ssl_enabled=>false}
[2024-01-25T16:45:02,535][INFO ][org.reflections.Reflections] Reflections took 114 ms to scan 1 urls, producing 132 keys and 468 values
/usr/share/logstash/vendor/bundle/jruby/3.1.0/gems/amazing_print-1.5.0/lib/amazing_print/formatter.rb:37: warning: previous definition of cast was here
/usr/share/logstash/vendor/bundle/jruby/3.1.0/gems/nokogiri-1.15.5-java/lib/nokogiri/xml/node.rb:1007: warning: method redefined; discarding old attr
[2024-01-25T16:45:03,881][INFO ][logstash.codecs.json     ] ECS compatibility is enabled but `target` option was not specified. This may cause fields to be set at the top-level of the event where they are likely to clash with the Elastic Common Schema. It is recommended to set the `target` option to avoid potential schema conflicts (if your data is ECS compliant or non-conflicting, feel free to ignore this message)
[2024-01-25T16:45:03,907][INFO ][logstash.javapipeline    ] Pipeline `main` is configured with `pipeline.ecs_compatibility: v8` setting. All plugins in this pipeline will default to `ecs_compatibility => v8` unless explicitly configured otherwise.
[2024-01-25T16:45:26,616][INFO ][logstash.javapipeline    ][main] Starting pipeline {:pipeline_id=>"main", "pipeline.workers"=>4, "pipeline.batch.size"=>125, "pipeline.batch.delay"=>50, "pipeline.max_inflight"=>500, "pipeline.sources"=>["/etc/logstash/conf.d/wazuh-s3.conf"], :thread=>"#<Thread:0x33eda5f2 /usr/share/logstash/logstash-core/lib/logstash/java_pipeline.rb:134 run>"}
[2024-01-25T16:45:27,017][INFO ][logstash.javapipeline    ][main] Pipeline Java execution initialization time {"seconds"=>0.4}
[2024-01-25T16:45:27,426][INFO ][logstash.inputs.opensearch][main] ECS compatibility is enabled but `target` option was not specified. This may cause fields to be set at the top-level of the event where they are likely to clash with the Elastic Common Schema. It is recommended to set the `target` option to avoid potential schema conflicts (if your data is ECS compliant or non-conflicting, feel free to ignore this message)
[2024-01-25T16:45:27,427][INFO ][logstash.javapipeline    ][main] Pipeline started {"pipeline.id"=>"main"}
[2024-01-25T16:45:27,439][INFO ][logstash.agent           ] Pipelines running {:count=>1, :running_pipelines=>[:main], :non_running_pipelines=>[]}

image

Bibliography

@AlexRuiz7
Copy link
Member Author

@wazuhci wazuhci moved this from In progress to On hold in Release 4.9.0 Jan 26, 2024
@wazuhci wazuhci moved this from On hold to In progress in Release 4.9.0 Jan 30, 2024
@f-galland
Copy link
Member

We analyzed the option of writing a custom ruby based filter for logstash that would transcode to parquet, but the s3 output plugin from logstash doesn't support parquet's binary file format.

Just for reference, in order to run parquet encoding in ruby, some dependencies are needed under ubuntu/debian:

sudo apt update
sudo apt install -y -V ca-certificates lsb-release wget ruby-dev build-essential
wget https://apache.jfrog.io/artifactory/arrow/$(lsb_release --id --short | tr 'A-Z' 'a-z')/apache-arrow-apt-source-latest-$(lsb_release --codename --short).deb
sudo apt install -y -V ./apache-arrow-apt-source-latest-$(lsb_release --codename --short).deb
sudo apt update
sudo apt install -y -V libarrow-dev # For C++
gem install red-arrow
gem install red-parquet

Parquet output can be generated from a json file as follows:

#!/usr/bin/env ruby

require 'arrow'
require 'parquet'


table = Arrow::Table.load("test.json", format: :json)

table.save("output.parquet")

@AlexRuiz7
Copy link
Member Author

Conclusions

We've got a base for the Logstash's pipeline and have verified it works. We'll evolve the pipeline depending on the chosen proposal to transform the data.

Check #145

@wazuhci wazuhci moved this from On hold to Done in Release 4.9.0 Jan 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
level/task Task issue type/research Research issue
Projects
No open projects
Status: Done
Development

Successfully merging a pull request may close this issue.

2 participants