-
Notifications
You must be signed in to change notification settings - Fork 54
Splitters
With input types such as Redis or UDP, the length of a message is already provided by the underlying protocol.
However, streams of text-based records require a way to extract individual messages.
Some common ways do so are:
- One record per line. That is, messages are delimited by line feeds (
\n
or\r\n
). This is the traditional way to delimit records in log files. - Usage of a
NUL
character (\0
) instead of a line feed. This allows having line feeds in the messages themselves. For example, a JSON-encoded message can take advantage of this convention to seamlessly include human-readable stack traces. - Length-prefixed messages. RFC 5425 documents it for syslog-over-TLS messages, and syslog servers such as rsyslog or syslog-ng can have this feature enabled (in order to allow
\n
within messages) or disabled. Unfortunately, this way of splitting messages is not robust to data corruption. A single corrupted byte can immediately lead to subsequent log entries being non-parsable or incorrectly parsed.
Flowgger input types reading a stream of messages that require to know how messages are delimited accept a framing
property, which can be configured in the [input]
section.
framing = "line"
This is the traditional one-record-per-line convention, simple and human-readable:
first_message
second_message
third_message
framing = "nul"
With the nul
convention, a NUL byte (\0
) is used as a delimiter. If you are using JSON-based formats, this is a great choice. In particular, Graylog's GELF format requires it when used over TCP or TLS:
first_message\0second_message\0third_message\0
Finally, syslen
is required to parse syslog messages prefixed by their length. When this convention is being used, every line typically starts with a number:
13 first_message
14 second_message
13 third_message