Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ability to queue/spool to disk #575

Closed
djschny opened this issue Dec 21, 2015 · 14 comments
Closed

Add ability to queue/spool to disk #575

djschny opened this issue Dec 21, 2015 · 14 comments

Comments

@djschny
Copy link

djschny commented Dec 21, 2015

Currently it appears that with packetbeat, if the output destination (logstash, elasticsearch, etc.) is unavailable we will retry, but not queue/store the data locally so that recovery can happen in the case of a network or service outage.

Currently there is a max_retries setting that appears to be standard across the output plugins. My suggestion would be to add functionality and appropriate settings for local queueing. For example:

  • provide options of whether an in-memory queue is leveraged and it's size. If enable events would be queued in memory once max_retires is met.
  • provide options of whether on disk storage is leveraged once max_retries is meant and if the optionally configured memory buffer is full. The user can configuration the max amount of disk space that will be used before the oldest events are dropped. Perhaps default to 100MB for example.

This functionality is crucial for shipping and really makes for a flexible deployment topology.

@McStork
Copy link
Contributor

McStork commented Dec 21, 2015

It is great that you created this ticket. I wanted to do so really soon.
I think this feature is very important as many companies will want to monitor critical network operations.

It will also be useful in case of D/DoS attack: during an attack, Packetbeat might stop outputing to avoid more traffic congestion, while still capturing. Once the attack is over, Packetbeat will output the recorded data to LS or ES.

If we can define more precisely this feature, I would gladly work on it (maybe with someone else if that's achievable?). But anyway, I won't be able to begin working on this potential feature right now as I have DNS over TCP to finish first and other projects to work on.

@tsg
Copy link
Contributor

tsg commented Dec 22, 2015

+1 for this feature, I think it makes a lot of sense to have it for all Beats. It also came up in a discussion about building a Docker log driver that doesn't lose lines based on libbeat: https://github.com/elastic/libbeat/issues/37

@elvarb
Copy link

elvarb commented Apr 26, 2016

This is one of the best features nxlog has. Gives operations a lot of flex regarding availability of the central infrastructure. +1 for beats getting an internal queue.

@blubbi321
Copy link

blubbi321 commented May 12, 2016

+1 For the feature. @McStork did you have a chance to look into it yet?

Wondering if I could help .. how would you check if logstash is available? Only thing I found in this regard is https://discuss.elastic.co/t/what-is-a-recommended-healthcheck-to-use-for-logstash/27691

@McStork
Copy link
Contributor

McStork commented May 12, 2016

@blubbi321 Hi. Well, I looked at ways to implement it.

  • Using a library

There are Go libraries that are based on Queue providers (Redis, ...) but that's not suiting beats lightweight expectations.
I couldn't find any lightweight persistent queue Go library. Writing one would be cool and have a maintenance advantage; though it might not be easy, especially when it comes to optimizing iops or covering different usages.

  • Less is more

Instead of going through the hassle of writing a library, some chose to implement it directly in the processing pipeline of events. That's what the developers of Heka, another data collector/shipper written in Go, did using Protobuf:
https://github.com/mozilla-services/heka/tree/dev/pipeline

So that's the two main ways to go for it.

@medcl
Copy link
Contributor

medcl commented May 25, 2016

++ this feature
and this project is really interesting, https://github.com/kkdai/pd ,we may do things like that
with a Disk queue we don't need to worry about the output failure or memory usage, and also the way of Pub/Sub can let each output dealing with their own offset,so multi-output can be more easy to handle,and one broken output won't affect others.

@andrewkroh andrewkroh changed the title add ability to queue for packetbeat Add ability to queue/spool to disk Jan 25, 2017
@medcl
Copy link
Contributor

medcl commented Feb 6, 2017

NSQ's disk queue seems like a good implementation, i'd like to use it directly:
https://github.com/nsqio/go-diskqueue

@elvarb
Copy link

elvarb commented Feb 6, 2017

Using a local nsqd service could also be an option. So having a native nsq output out of all beats is the only thing needed

@medcl
Copy link
Contributor

medcl commented Feb 6, 2017

@elvarb nsq can be a dedicated option output, just like kafka right now we have, but here the disk queue is used internally to do local safe buffer, directly use a local nsqd will be too heavy i think

@elvarb
Copy link

elvarb commented Feb 6, 2017

@medcl I have used nsqd as a local queue in metric collection with good success. Uses very little resources, gives me the option of encrypting transfers and to use one of the many nsq utilities (nsq to nsq, nsq to file, nsq to http for example)

Though it does depend on the volume of data the host is gathering.

Regarding the nsq go-diskqueue package I'm glad to see that it is in its own repo now, there were a few requests to separate it from the main program because of its usefulness.

Another interesting solutions I have found that implement a local disk queue in go

@brandonmensing
Copy link

For some use cases we should also consider user modification of spooled data as a potentially bad thing. Controlling it might not be possible but perhaps we can at least monitor for potential modification and report back a chain of custody with the data. We would need to be exactly right when we provide a conclusion of modified or not. We could certainly have an 'unsure' for the many situations where maybe the Beat was off and we can't be sure what happened.

@bfgoodrich
Copy link

This feature would be really nice for those that are sending log or event data directly to beats and would like for the service to be more resilient. With an on-disk queue, it would be possible to flush memory to an on-disk queue and restart even though back-end services are currently unavailable or running too slow. (much like rsyslog memory and disk queue mechanism)

@robinatw
Copy link

robinatw commented Nov 21, 2017

@bfgoodrich ,
yeah, I totally agree with your point of view, It would be great if beats can have more queue types like rsyslog does.

http://www.rsyslog.com/doc/v8-stable/concepts/queues.html

@urso
Copy link

urso commented Dec 6, 2019

I'm closing this issue in favor of the ongoing meta issue. All Beats have support for a configurable queue. For example see filebeat docs: https://www.elastic.co/guide/en/beats/filebeat/current/configuring-internal-queue.html

Spooling to disk meta issue: #6859

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests