Skip to content
Frank Denis edited this page Sep 23, 2015 · 3 revisions

With input types such as Redis or UDP, the length of a message is already provided by the underlying protocol.

However, streams of text-based records require a way to extract individual messages.

Some common ways do so are:

  • One record per line. That is, messages are delimited by line feeds (\n or \r\n). This is the traditional way to delimit records in log files.
  • Usage of a NUL character (\0) instead of a line feed. This allows having line feeds in the messages themselves. For example, a JSON-encoded message can take advantage of this convention to seamlessly include human-readable stack traces.
  • Length-prefixed messages. RFC 5425 documents it for syslog-over-TLS messages, and syslog servers such as rsyslog or syslog-ng can have this feature enabled (in order to allow \n within messages) or disabled. Unfortunately, this way of splitting messages is not robust to data corruption. A single corrupted byte can immediately lead to subsequent log entries being non-parsable or incorrectly parsed.

Flowgger input types reading a stream of messages that require to know how messages are delimited accept a framing property, which can be configured in the [input] section.

Line splitter

framing = "line"

This is the traditional one-record-per-line convention, simple and human-readable:

first_message
second_message
third_message

NUL splitter

framing = "nul"

With the nul convention, a NUL byte (\0) is used as a delimiter. If you are using JSON-based formats, this is a great choice. In particular, Graylog's GELF format requires it when used over TCP or TLS:

first_message\0second_message\0third_message\0

Syslen splitter

Finally, syslen is required to parse syslog messages prefixed by their length. When this convention is being used, every line typically starts with a number:

13 first_message
14 second_message
13 third_message
Clone this wiki locally