Add ability to queue/spool to disk #575

djschny · 2015-12-21T09:56:37Z

Currently it appears that with packetbeat, if the output destination (logstash, elasticsearch, etc.) is unavailable we will retry, but not queue/store the data locally so that recovery can happen in the case of a network or service outage.

Currently there is a max_retries setting that appears to be standard across the output plugins. My suggestion would be to add functionality and appropriate settings for local queueing. For example:

provide options of whether an in-memory queue is leveraged and it's size. If enable events would be queued in memory once max_retires is met.
provide options of whether on disk storage is leveraged once max_retries is meant and if the optionally configured memory buffer is full. The user can configuration the max amount of disk space that will be used before the oldest events are dropped. Perhaps default to 100MB for example.

This functionality is crucial for shipping and really makes for a flexible deployment topology.

The text was updated successfully, but these errors were encountered:

McStork · 2015-12-21T10:44:39Z

It is great that you created this ticket. I wanted to do so really soon.
I think this feature is very important as many companies will want to monitor critical network operations.

It will also be useful in case of D/DoS attack: during an attack, Packetbeat might stop outputing to avoid more traffic congestion, while still capturing. Once the attack is over, Packetbeat will output the recorded data to LS or ES.

If we can define more precisely this feature, I would gladly work on it (maybe with someone else if that's achievable?). But anyway, I won't be able to begin working on this potential feature right now as I have DNS over TCP to finish first and other projects to work on.

tsg · 2015-12-22T00:43:14Z

+1 for this feature, I think it makes a lot of sense to have it for all Beats. It also came up in a discussion about building a Docker log driver that doesn't lose lines based on libbeat: https://github.com/elastic/libbeat/issues/37

elvarb · 2016-04-26T15:59:37Z

This is one of the best features nxlog has. Gives operations a lot of flex regarding availability of the central infrastructure. +1 for beats getting an internal queue.

blubbi321 · 2016-05-12T16:33:25Z

+1 For the feature. @McStork did you have a chance to look into it yet?

Wondering if I could help .. how would you check if logstash is available? Only thing I found in this regard is https://discuss.elastic.co/t/what-is-a-recommended-healthcheck-to-use-for-logstash/27691

McStork · 2016-05-12T20:27:49Z

@blubbi321 Hi. Well, I looked at ways to implement it.

Using a library

There are Go libraries that are based on Queue providers (Redis, ...) but that's not suiting beats lightweight expectations.
I couldn't find any lightweight persistent queue Go library. Writing one would be cool and have a maintenance advantage; though it might not be easy, especially when it comes to optimizing iops or covering different usages.

Less is more

Instead of going through the hassle of writing a library, some chose to implement it directly in the processing pipeline of events. That's what the developers of Heka, another data collector/shipper written in Go, did using Protobuf:
https://github.com/mozilla-services/heka/tree/dev/pipeline

So that's the two main ways to go for it.

medcl · 2016-05-25T02:57:38Z

++ this feature
and this project is really interesting, https://github.com/kkdai/pd ,we may do things like that
with a Disk queue we don't need to worry about the output failure or memory usage, and also the way of Pub/Sub can let each output dealing with their own offset,so multi-output can be more easy to handle,and one broken output won't affect others.

medcl · 2017-02-06T03:11:39Z

NSQ's disk queue seems like a good implementation, i'd like to use it directly:
https://github.com/nsqio/go-diskqueue

elvarb · 2017-02-06T13:42:43Z

Using a local nsqd service could also be an option. So having a native nsq output out of all beats is the only thing needed

medcl · 2017-02-06T13:54:25Z

@elvarb nsq can be a dedicated option output, just like kafka right now we have, but here the disk queue is used internally to do local safe buffer, directly use a local nsqd will be too heavy i think

elvarb · 2017-02-06T14:17:04Z

@medcl I have used nsqd as a local queue in metric collection with good success. Uses very little resources, gives me the option of encrypting transfers and to use one of the many nsq utilities (nsq to nsq, nsq to file, nsq to http for example)

Though it does depend on the volume of data the host is gathering.

Regarding the nsq go-diskqueue package I'm glad to see that it is in its own repo now, there were a few requests to separate it from the main program because of its usefulness.

Another interesting solutions I have found that implement a local disk queue in go

brandonmensing · 2017-02-06T15:21:54Z

For some use cases we should also consider user modification of spooled data as a potentially bad thing. Controlling it might not be possible but perhaps we can at least monitor for potential modification and report back a chain of custody with the data. We would need to be exactly right when we provide a conclusion of modified or not. We could certainly have an 'unsure' for the many situations where maybe the Beat was off and we can't be sure what happened.

bfgoodrich · 2017-05-04T17:55:45Z

This feature would be really nice for those that are sending log or event data directly to beats and would like for the service to be more resilient. With an on-disk queue, it would be possible to flush memory to an on-disk queue and restart even though back-end services are currently unavailable or running too slow. (much like rsyslog memory and disk queue mechanism)

robinatw · 2017-11-21T10:24:06Z

@bfgoodrich ,
yeah, I totally agree with your point of view, It would be great if beats can have more queue types like rsyslog does.

http://www.rsyslog.com/doc/v8-stable/concepts/queues.html

urso · 2019-12-06T09:15:33Z

I'm closing this issue in favor of the ongoing meta issue. All Beats have support for a configurable queue. For example see filebeat docs: https://www.elastic.co/guide/en/beats/filebeat/current/configuring-internal-queue.html

Spooling to disk meta issue: #6859

tsg added the enhancement label Dec 22, 2015

mrkschan mentioned this issue Dec 24, 2015

[Packetbeat] Restrict max buffer size before send to logstash #516

Closed

monicasarbu added the libbeat label Apr 21, 2016

andrewkroh changed the title ~~add ability to queue for packetbeat~~ Add ability to queue/spool to disk Jan 25, 2017

bobapple mentioned this issue Apr 6, 2017

Possible Stale Connection Icingabeat and Icinga2 Icinga/icingabeat#9

Closed

ph mentioned this issue Dec 20, 2017

Allow filebeat to collect syslog events #5862

Closed

4 tasks

ph mentioned this issue Feb 9, 2018

Add disk cache for outputs when destination is down #6289

Closed

urso closed this as completed Dec 6, 2019

andrewkroh mentioned this issue Jan 14, 2020

[meta] Update to ECS 1.2 to 1.4 #13940

Closed

51 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ability to queue/spool to disk #575

Add ability to queue/spool to disk #575

djschny commented Dec 21, 2015

McStork commented Dec 21, 2015

tsg commented Dec 22, 2015

elvarb commented Apr 26, 2016

blubbi321 commented May 12, 2016 •

edited

Loading

McStork commented May 12, 2016 •

edited

Loading

medcl commented May 25, 2016

medcl commented Feb 6, 2017

elvarb commented Feb 6, 2017

medcl commented Feb 6, 2017

elvarb commented Feb 6, 2017

brandonmensing commented Feb 6, 2017

bfgoodrich commented May 4, 2017

robinatw commented Nov 21, 2017 •

edited

Loading

urso commented Dec 6, 2019

Add ability to queue/spool to disk #575

Add ability to queue/spool to disk #575

Comments

djschny commented Dec 21, 2015

McStork commented Dec 21, 2015

tsg commented Dec 22, 2015

elvarb commented Apr 26, 2016

blubbi321 commented May 12, 2016 • edited Loading

McStork commented May 12, 2016 • edited Loading

medcl commented May 25, 2016

medcl commented Feb 6, 2017

elvarb commented Feb 6, 2017

medcl commented Feb 6, 2017

elvarb commented Feb 6, 2017

brandonmensing commented Feb 6, 2017

bfgoodrich commented May 4, 2017

robinatw commented Nov 21, 2017 • edited Loading

urso commented Dec 6, 2019

blubbi321 commented May 12, 2016 •

edited

Loading

McStork commented May 12, 2016 •

edited

Loading

robinatw commented Nov 21, 2017 •

edited

Loading