-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce spooling to disk #6581
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -148,6 +148,66 @@ auditbeat.modules: | |
# if the number of events stored in the queue is < min_flush_events. | ||
#flush.timeout: 1s | ||
|
||
# The spool queue will store events in a local spool file, before | ||
# forwarding the events to the outputs. | ||
# | ||
# Beta: spooling to disk is currently a beta feature. Use with care. | ||
# | ||
# The spool file is a circular buffer, which blocks once the file/buffer is full. | ||
# Events are put into a write buffer and flushed once the write buffer | ||
# is full or the flush_timeout is triggered. | ||
# Once ACKed by the output, events are removed immediately from the queue, | ||
# making space for new events to be persisted. | ||
#spool: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why not call this There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We should probably align with the naming/convention if/when possible with Logstash persistent queue. In that case we are using There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmmm... introducing
Whats wrong with Right now this one implements a FIFO. For metricbeat use cases we might actually introduce another persistent LIFO queue type. This one should have another type. How would you name any of these now? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
#queue:
# Queue type by name (default 'mem')
# The memory queue will present all available events (up to the outputs
# bulk_max_size) to the output, the moment the output is ready to server
# another batch of events.
#mem:
# Max number of events the queue can buffer. You are right, actually I think our config make it cleaner. @ruflin @kvch I would vote to keep @urso suggestion. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For the LIFO, FIFO, its more like a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think we should introduce a type because of the reasons mentioned by @urso. There can only be one queue at the time. For me spooling can be to disk or memory. Historically we had a spooler in filebeat which kept the data in memory. Other options would be to call it +1 on the proposal from @ph about the priority. FIFO or LIFO is a config option of the queue. It can mean in the background a completely different implementation but the user should not have to worry about that. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Regarding priority I was initially thinking the same. But if we have a priority setting, user must be allowed to change the setting between restarts. But async ACK behaviour of the queue makes the on-disk structure a little more complicated. When introducing stack like functionality, we will end up with a many holes. That is, freeing space will be somewhat more complicated in the LIFO case. I'd like to solve the LIFO case separately, potentially merging both cases into a common file format, later in time. Priority based queue, using heaps might become even more complicated. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
# The file namespace configures the file path and the file creation settings. | ||
# Once the file exists, the `size`, `page_size` and `prealloc` settings | ||
# will have no more effect. | ||
#file: | ||
# Location of spool file. The default value is ${path.data}/spool.dat. | ||
#path: "${path.data}/spool.dat" | ||
|
||
# Configure file permissions if file is created. The default value is 0600. | ||
#permissions: 0600 | ||
|
||
# File size hint. The spool blocks, once this limit is reached. The default value is 100 MiB. | ||
#size: 100MiB | ||
|
||
# The files page size. A file is split into multiple pages of the same size. The default value is 4KiB. | ||
#page_size: 4KiB | ||
|
||
# If prealloc is set, the required space for the file is reserved using | ||
# truncate. The default value is true. | ||
#prealloc: true | ||
|
||
# Spool writer settings | ||
# Events are serialized into a write buffer. The write buffer is flushed if: | ||
# - The buffer limit has been reached. | ||
# - The configured limit of buffered events is reached. | ||
# - The flush timeout is triggered. | ||
#write: | ||
# Sets the write buffer size. | ||
#buffer_size: 1MiB | ||
|
||
# Maximum duration after which events are flushed, if the write buffer | ||
# is not full yet. The default value is 1s. | ||
#flush.timeout: 1s | ||
|
||
# Number of maximum buffered events. The write buffer is flushed once the | ||
# limit is reached. | ||
#flush.events: 16384 | ||
|
||
# Configure the on-disk event encoding. The encoding can be changed | ||
# between restarts. | ||
# Valid encodings are: json, ubjson, and cbor. | ||
#codec: cbor | ||
#read: | ||
# Reader flush timeout, waiting for more events to become available, so | ||
# to fill a complete batch, as required by the outputs. | ||
# If flush_timeout is 0, all available events are forwarded to the | ||
# outputs immediately. | ||
# The default value is 0s. | ||
#flush.timeout: 0s | ||
|
||
# Sets the maximum number of CPUs that can be executing simultaneously. The | ||
# default is the number of logical CPUs available in the system. | ||
#max_procs: | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -43,6 +43,66 @@ | |
# if the number of events stored in the queue is < min_flush_events. | ||
#flush.timeout: 1s | ||
|
||
# The spool queue will store events in a local spool file, before | ||
# forwarding the events to the outputs. | ||
# | ||
# Beta: spooling to disk is currently a beta feature. Use with care. | ||
# | ||
# The spool file is a circular buffer, which blocks once the file/buffer is full. | ||
# Events are put into a write buffer and flushed once the write buffer | ||
# is full or the flush_timeout is triggered. | ||
# Once ACKed by the output, events are removed immediately from the queue, | ||
# making space for new events to be persisted. | ||
#spool: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We should mark this beta at first, in the code as a log message, here in the config and in the docs. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done. |
||
# The file namespace configures the file path and the file creation settings. | ||
# Once the file exists, the `size`, `page_size` and `prealloc` settings | ||
# will have no more effect. | ||
#file: | ||
# Location of spool file. The default value is ${path.data}/spool.dat. | ||
#path: "${path.data}/spool.dat" | ||
|
||
# Configure file permissions if file is created. The default value is 0600. | ||
#permissions: 0600 | ||
|
||
# File size hint. The spool blocks, once this limit is reached. The default value is 100 MiB. | ||
#size: 100MiB | ||
|
||
# The files page size. A file is split into multiple pages of the same size. The default value is 4KiB. | ||
#page_size: 4KiB | ||
|
||
# If prealloc is set, the required space for the file is reserved using | ||
# truncate. The default value is true. | ||
#prealloc: true | ||
|
||
# Spool writer settings | ||
# Events are serialized into a write buffer. The write buffer is flushed if: | ||
# - The buffer limit has been reached. | ||
# - The configured limit of buffered events is reached. | ||
# - The flush timeout is triggered. | ||
#write: | ||
# Sets the write buffer size. | ||
#buffer_size: 1MiB | ||
|
||
# Maximum duration after which events are flushed, if the write buffer | ||
# is not full yet. The default value is 1s. | ||
#flush.timeout: 1s | ||
|
||
# Number of maximum buffered events. The write buffer is flushed once the | ||
# limit is reached. | ||
#flush.events: 16384 | ||
|
||
# Configure the on-disk event encoding. The encoding can be changed | ||
# between restarts. | ||
# Valid encodings are: json, ubjson, and cbor. | ||
#codec: cbor | ||
#read: | ||
# Reader flush timeout, waiting for more events to become available, so | ||
# to fill a complete batch, as required by the outputs. | ||
# If flush_timeout is 0, all available events are forwarded to the | ||
# outputs immediately. | ||
# The default value is 0s. | ||
#flush.timeout: 0s | ||
|
||
# Sets the maximum number of CPUs that can be executing simultaneously. The | ||
# default is the number of logical CPUs available in the system. | ||
#max_procs: | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume only 1 queue can be used at the same time. Should we mention that in a comment on line 130?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to? If you configure 2 queue types, beats will report an error that only one is allowed. It's using the dictionary style plugin config we have in quite a few places in beats.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mainly brought it up as people keep trying to use 2 outputs in Beats and they will for queues. So having it documented makes it easy to point people to it. So they know it already from the docs before running the docs.