-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Amazon Security Lake integration - Architecture and requirements #113
Comments
@AlexRuiz7 did you consider using a kinesis firehose with the lambda as its data transformation? This would let you skip the raw events s3 bucket and have firehose write them directly to the security lake custom source bucket. |
Hi @kclinden Not really, I'm no expert in AWS, so I went for the easiest path. I remember reading about it briefly, but iirc it would have increased the maintenance costs. Maybe I'm wrong. How would it work in that case. Data flows through kinesis firehose straight into the Security Lake bucket? How do you define the OCSF class of the events in that case? |
Firehose would send the data to the same Lambda function that you have already put together. The benefit being that it would let you skip the intermediate S3 bucket location and have Logstash write directly to the Firehose. Firehose does data transformation by sending to lambda and then Firehose writes to the bucket instead of the Lambda. For the Data Prepper solution I would probably try to accomplish it all in the pipeline definition similar to this - |
Description
Related issue: #113
In order to develop an integration as a source for Amazon Security Lake, it is necessary to investigate and understand the architecture and requirements that the integration must follow. Therefore, this issue aims to answer the questions of what the integration will look like and how it will be carried out.
Requirements and good practices
Source: https://docs.aws.amazon.com/security-lake/latest/userguide/custom-sources.html
Architecture
Overview of Security Lake
Source: https://docs.aws.amazon.com/security-lake/latest/userguide/what-is-security-lake.html
By taking a look at the conceptual diagram of Amazon Security Lake above these lines, it stands clear that our integration as a source has to be done through an Amazon S3 bucket. In particular, we are looking at the relation between Amazon S3 and "Data from SaaS application, partner solutions, cloud providers and your customer data converted to OCSF".
In order to push the data from
wazuh-indexer
(OpenSearch) to Amazon S3, we can either use Logstash or Data Prepper. Both tools have the input and output plugins required to read data from OpenSearch and send them to an Amazon S3 bucket.Logstash vs Data Prepper
Both tools provide:
Elasticsearch / OpenSearch input plugin.
Amazon S3 output plugin.
By comparing both tools, it soon becomes obvious that Logstash is a better choice, for the following reasons:
OCSF compliant data as Apache Parquet
As Amazon Security Lake requires the data to use the OCSF schema and the Parquet encoding, we need to find a way to transform our data before delivering it to Amazon Security Lake.
Several proposals have been generated:
These proposals have their advantages and disadvantages.
While proposal nr.1 is the most realizable, it is also the most expensive. On the other hand, proposal nr.3 is the least realizable, due to the scarce knowledge of Ruby and Logstash's plugins ecosystem, but the cheapest one to the end-user. Proposal nr.2 is a middle ground between the two.
We will explore proposals nr.1 and nr.2, with future plans on exploring proposal nr.3, depending on our success on the other two.
Conclusions
Resources and bibliography
logstash-input-opensearch
plugin for OpenSearchThe text was updated successfully, but these errors were encountered: