Extra BOM bytes being inserted in Seq buffer file #23

EamonHetherton · 2016-11-11T06:07:30Z

This has taken some time to investigate, and seems that the problem is due to a confluence of issues across separate Seq/serilog libraries

The problem description:
We are using serilog to log to Seq using the DurableSeqSink with a file size limit set. For a particular use case we have implemented an web endpoint that has the same api as Seq raw events endpoint.

What we were finding is that the data being posted often had an extra "}," (close brace and comma) like this:
{
"Events": [{
"Timestamp": "2015-05-09T22:09:08.12345+10:00",
"Level": "Warning",
"MessageTemplate": "Disk space is low on {Drive}",
"Properties": {
"Drive": "C:",
"MachineName": "nblumhardt-rmbp"
}
},
}
]
}
which meant that the payload was not valid json and so would fail to deserialize and return an error.

Eventually I determined that the buffer file for the serilog seq logger had additional bytes (invisible in a normal text editior). These were the bytes "ef bb bf" which happens to be the UTF8 BOM.

Again, while not ideal, this would not have caused the problem but for the fact that the HttpLogShipper does it's own byte counting instead of just relying on StreamReader (which seems to handle the extra BOM part way through a file) and it assumes that the BOM can only exist at the start of the file so it ends up incorrectly positioning its cursor 3 bytes before the end of the line instead of at the beginning of the next line, hence the extra '}'.

Now, as to why the extra bytes get into the file in the first place, it turns out that due to a quirk in the StreamWriter implementation (https://referencesource.microsoft.com/#mscorlib/system/io/streamwriter.cs,208) that only skips writing the encoding preamble when appending to a stream if the stream is seekable, when you set a fileSizeLimitBytes on the FileSink the preamble gets written again.

When using the DurableSeqSink, it uses one file per day by default so we get the error as manytimes as the process starts on a given day.

I have created some unit tests to demonstrate the problem (pull request coming) and I can think of several ways to work around the problem, but ultimately I think that coordinated changes need to be made to both the Serilog.Sinks.File.FileSink and Serilog.Sinks.Seq.HttpLogShipper.

The fix closest to the source of the problem is to make the WriteCountingStream seekable or to use an encoding that does not include a preamble (don't use the Encoding.UTF8 static member). The second option is not viable here because of the way that the HttpLogShipper works as it assumes the file is written using UTF8 with a preamble and it will fail in a different way if it is not. I think that continuing to have the HttpLogShipper do its own stream position management instead of using a StreamReader is a timebomb. I don't actually understand why it is tracking it's own stream position in the first place.

nblumhardt · 2016-11-11T09:10:48Z

Hi Eamon, thanks for the thorough investigation and report. Wow! I'm really surprised by the StreamWriter behaviour here. Nice find :-)

I agree that making the wrapper appear to be seekable is the way to go - the log shipper needs to muck around with positioning because it may read from the file with partial data for a line present. I'll take a look back at that code and see if there's a more robust approach possible.

I'll run through the workaround PR now, much appreciated.

nblumhardt added a commit that referenced this issue Nov 14, 2016

PR code review items from #23

1b029c8

nblumhardt closed this as completed Nov 14, 2016

This was referenced Nov 15, 2016

Update declared dependencies to pull in file sink bug fix datalust/serilog-sinks-seq#47

Merged

3.2.0 Release datalust/serilog-sinks-seq#48

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extra BOM bytes being inserted in Seq buffer file #23

Extra BOM bytes being inserted in Seq buffer file #23

EamonHetherton commented Nov 11, 2016

nblumhardt commented Nov 11, 2016

Extra BOM bytes being inserted in Seq buffer file #23

Extra BOM bytes being inserted in Seq buffer file #23

Comments

EamonHetherton commented Nov 11, 2016

nblumhardt commented Nov 11, 2016