Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

S3 Output #2551

Closed
wants to merge 2 commits into from
Closed

S3 Output #2551

wants to merge 2 commits into from

Conversation

fritzhardy
Copy link

@fritzhardy fritzhardy commented Sep 14, 2016

Implements S3 output. Marries well with logstash S3 input, using S3 as a queuing/archival mechanism.

Stages lines to a local file and uploads at configurable number of bytes (based on fileout). Uploaded object path/name somewhat resembles CloudTrail logs currently, /somebucket/YYYY/MM/DD/hostname_ISO8601DATE.gz in UTC.

Example configuration:

output:
  s3:
    enabled: true
    path: "/var/log/s3"
    filename: s3
    upload_every_kb: 3
    #number_of_files: 2
    bucket: somebucket

@karmi
Copy link

karmi commented Sep 14, 2016

Hi @fritzhardy, we have found your signature in our records, but it seems like you have signed with a different e-mail than the one used in yout Git commit. Can you please add both of these e-mails into your Github profile (they can be hidden), so we can match your e-mails to your Github profile?

@elasticsearch-release
Copy link

Jenkins standing by to test this. If you aren't a maintainer, you can ignore this comment. Someone with commit access, please review this and clear it for Jenkins to run.

1 similar comment
@elasticsearch-release
Copy link

Jenkins standing by to test this. If you aren't a maintainer, you can ignore this comment. Someone with commit access, please review this and clear it for Jenkins to run.

@ruflin
Copy link
Contributor

ruflin commented Sep 16, 2016

@fritzhardy Thanks a lot for all your work on the S3 output and providing directly a PR. Unfortunately we currently do not plan to add further outputs as each output does add additional work on the maintenance and support side. We are a small team and currently focused on other enhancements on the beats side.

For additional outputs we recommend to keep them in a separate repository without maintaining a full fork and add them during compile time. For further details see the comment here: #1525 (comment) Like this outputs work almost like pPlugins. Perhaps one day Golang will support it, which will make adding external outputs even simpler.

We are always happy to help with questions or reviewing code. Feel free to ping us on discuss at any time: https://discuss.elastic.co/c/beats

@ruflin
Copy link
Contributor

ruflin commented Sep 16, 2016

Based on the comment above, I'm closing this PR. @fritzhardy Feel free to use this PR for further discussions.

@ruflin ruflin closed this Sep 16, 2016
@fritzhardy
Copy link
Author

I think it would be a boon to the project to have this as an official addition at some point. We find this approach gives us "networkless" log forwarding at little expense, and on the logstash side, all the configurability necessary to deal with a bucket of logs. However, I am aware that additional output plugins have been a source of discussion. In the meantime, I will review #1525 (comment). I was looking at some way to do that externally in the first place, akin to the external beats. Thanks for your time and feedback.

@trixpan
Copy link

trixpan commented Oct 4, 2016

Got to this while reading #1525...

@fritzhardy the straight to persistence output is indeed a clever approach which is not so widely used.

However, from what I gather, ( @ruflin can correct me if I am wrong) Elastic community is trying to avoid the inclusion of too many additional transport features as they end up resulting in a code base that is hard do manage the in the long run. To be honest I think their approach is fair call.

Having said that, projects like Apache MiNiFi allow you to do this by simply adding existing processors (PutS3 in this case) to the MiNiFi install but at the cost of running a lightweight JVM @ the edge.

Disclaimer: I am a NiFi committer.

@ruflin
Copy link
Contributor

ruflin commented Oct 4, 2016

@trixpan We are actually hoping that Golang Plugins become a reality in the near future and this would directly solve the above issue.

About the approach you mentioned: Does that mean in the end 2 processes are running? MiNiFi + PutS3?

@trixpan
Copy link

trixpan commented Oct 4, 2016

@ruflin yes. GoLang plugins will be handy. Will be happy to have a go trying to add NiFi site 2 site support to filebeat when your feature is stable.

Regarding MiNiFi + PutS3: Just a single JVM is executed, MiNiFi (the framework). Whitin MiNiFi we run the PutS3 processor (think of a processor as a Input/Codec/Filter/Output in logstash terms).

To certain extend, MiNiFi is a stripped version of the overall framework (NiFi) and as such is able to reutilise processors of the main code base as illustrated here:

https://github.com/apache/nifi-minifi/blob/master/minifi-nar-bundles/minifi-standard-nar/pom.xml#L42

To cater for the S3 usecase, user would have to add NiFi NARs (nifi-aws-bundle) into the minifi install package (i.e. copy into the adequate folder) and use the installed processors (PutS3) as (s)he would normally do when using the main NiFi platform. Ironically enoug - if I recall correctly - outputting to a central NiFi would be optional.

In ELK terms, this would be akin to running a minimalist version of Logstash at the producer level, reusing some of the jRuby code that powers Logstash, instead of developing a golang producer (i.e. lumberjack-forwarder / *beats) from scratch. There are pros and cons on both approaches and no wonder we are also building minifi-cpp. 😃

@ktham
Copy link

ktham commented Oct 22, 2018

@ruflin We're looking at dumping our logs into S3 and would like to see S3 as an output for filebeats. Looks like there was a previous effort here in this PR to add this. What is your recommendation here? Is it possible to support S3 output?

@fritzhardy
Copy link
Author

We have been using this fork in production for two years. It needs some polish, but gets the job done.

@ktham
Copy link

ktham commented Oct 23, 2018

I see, given that it's been working without issue for the past 2 years, I'd like to see if it's possible to include this output in the upstream filebeats project, or if there's a way to factor this into a plug-in so as to avoid using a fork of filebeats.

@ktham
Copy link

ktham commented Nov 19, 2019

@ruflin/@fritzhardy Can we revisit the topic of adding an S3 output? We are willing to help with adding the code if necessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants