Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add file spool to queue docs #6902

Merged
merged 4 commits into from
May 31, 2018
Merged

Add file spool to queue docs #6902

merged 4 commits into from
May 31, 2018

Conversation

urso
Copy link

@urso urso commented Apr 19, 2018

No description provided.

@urso urso added the docs label Apr 19, 2018
@urso urso mentioned this pull request Apr 19, 2018
39 tasks
Copy link
Contributor

@dedemorton dedemorton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some feedback plus a few edits. I'd like to do a more thorough edit of this content later (rather than cluttering up the review with a bunch of small edits).

[[configuration-internal-queue-spool]]
=== Configure the file spool queue

The file spooling queue stores all events in an on disk ring buffer. The spool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use file spooling queue or file spool queue consistently.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is a ring buffer something that end users are likely to know about? If not, it's worth adding a sentence to explain what this is.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say yes, as ring buffers are quite common.

@@ -76,3 +77,147 @@ will be immediately available for consumption.

The default values is 0s.

[float]
[[configuration-internal-queue-spool]]
=== Configure the file spool queue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that we have a couple of queue types, we should explain why users might choose to use a specific type of queue.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. The interaction of a configured queue and the actual beat differs per beat. This would call for some beat specific documentation. As this will require some more work, I'd prefer to postpone the per-beats documentation for now.

after a signal from the output has been received.

On disk, the spool operates in pages. The `file.page_size` setting configures
the files page size on creation time. The optimal page size depends on the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"files page size" isn't grammatically correct. Do you mean:

file's page size?
file page size?
page size for files?

search for other instances of "files page size" and replace them with whichever is correct.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

file's page size. A file is divided into pages of page_size.

Note: as followup we want to add a FAQ entry on determining the optimal page size. These can differ by OS and filesystem. Tools and nomenclature is different on different OSes as well.

------------------------------------------------------------------------------
queue.spool: ~
------------------------------------------------------------------------------

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth pointing the user to the Configuration options section for the defaults.

Copy link
Author

@urso urso Apr 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a link to text before sample


This sample configuration creates a spool of 512MiB, with 16KiB pages. The
write buffer is flushed if 10MiB of contents, or 1024 events have been
written. If oldest available event is already waiting for 5s in the write
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change to say, "If the oldest available event has been waiting for 5s..."


===== `file.page_size`

The files page size.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see comment above


The default value is 4096 (4KiB).

Note: This setting should match the file systems minimum block size. If the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as above (NOTE instead of Note). I won't flag all of these.

Also it should be "file system's" (possessive) not "file systems".

The default value is 4096 (4KiB).

Note: This setting should match the file systems minimum block size. If the
`page_size` is not a multiple of the file systems block size, the file system
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto: "file system's" (possessive) not "file systems"

`page_size` is not a multiple of the file systems block size, the file system
might create additional read operations on writes.

Note: The page size is only set on file creation time. It can not be changed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"only set at file creation time" sounds better to me. (comment applies to other instances)

===== `read.flush.timeout`

If configured, the `read.flush.timeout` ensures the spool will wait for more
events to be flushed to the spool file, If the number of available events is
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of typos. Should be:

...if the number of available events is less than the output's bulk_max_size...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a hard time writing docs on this setting. Did rewrite complete section. Hope it's somewhat better now.

@ruflin ruflin added the needs_backport PR is waiting to be backported to other branches. label Apr 20, 2018
@ruflin
Copy link
Member

ruflin commented Apr 20, 2018

I added the needs_backport label.

@urso
Copy link
Author

urso commented Apr 25, 2018

@dedemorton I updated the docs. Thanks for the input.

I think we might need some more follow up docs. Maybe per beat, as the impact of the queue type kind of depends on the actual beat and the data source.

As FAQ entry I can think of determining the optimal 'page size' using OS tools.

Copy link
Contributor

@dedemorton dedemorton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@urso Just one small comment. Otherwise the changes LGTM. Sorry this one fell off my radar. I've created an issue for adding more detail about when/why to configure queue options: #7217

@@ -11,7 +11,7 @@ options in the `queue` section of the +{beatname_lc}.yml+ config file. Only one
queue type can be configured.


Example configuration:
This sample configurations configures the memory queue, buffering up to 4096 events:
Copy link
Contributor

@dedemorton dedemorton May 31, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The phrase "configurations configure" sounds a bit odd (and should be singular not plural). How about:

This sample configuration sets the memory queue to buffer up to 4096 events.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


The default value is "${path.data}/spool.dat".

===== `file.permissions`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@urso The content looks good, but just noticed that some of the sections are missing [float] tags starting with file.permissions.

Copy link
Contributor

@dedemorton dedemorton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dedemorton dedemorton merged commit b424793 into elastic:master May 31, 2018
urso pushed a commit to urso/beats that referenced this pull request Jun 5, 2018
* Add spool docs

* review

* review (phrasing)

* Add missing [float]

(cherry picked from commit b424793)
@urso urso added v6.3.0 and removed needs_backport PR is waiting to be backported to other branches. labels Jun 5, 2018
ph pushed a commit that referenced this pull request Jun 5, 2018
* Add spool docs

* Add missing [float]

(cherry picked from commit b424793)
@urso urso deleted the doc-spool branch February 19, 2019 18:56
leweafan pushed a commit to leweafan/beats that referenced this pull request Apr 28, 2023
* Add spool docs

* Add missing [float]

(cherry picked from commit 96d170a)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants