Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extra BOM bytes being inserted in Seq buffer file #23

Closed
EamonHetherton opened this issue Nov 11, 2016 · 1 comment
Closed

Extra BOM bytes being inserted in Seq buffer file #23

EamonHetherton opened this issue Nov 11, 2016 · 1 comment

Comments

@EamonHetherton
Copy link
Contributor

This has taken some time to investigate, and seems that the problem is due to a confluence of issues across separate Seq/serilog libraries

The problem description:
We are using serilog to log to Seq using the DurableSeqSink with a file size limit set. For a particular use case we have implemented an web endpoint that has the same api as Seq raw events endpoint.

What we were finding is that the data being posted often had an extra "}," (close brace and comma) like this:
{
"Events": [{
"Timestamp": "2015-05-09T22:09:08.12345+10:00",
"Level": "Warning",
"MessageTemplate": "Disk space is low on {Drive}",
"Properties": {
"Drive": "C:",
"MachineName": "nblumhardt-rmbp"
}
},
}
]
}
which meant that the payload was not valid json and so would fail to deserialize and return an error.

Eventually I determined that the buffer file for the serilog seq logger had additional bytes (invisible in a normal text editior). These were the bytes "ef bb bf" which happens to be the UTF8 BOM.

Again, while not ideal, this would not have caused the problem but for the fact that the HttpLogShipper does it's own byte counting instead of just relying on StreamReader (which seems to handle the extra BOM part way through a file) and it assumes that the BOM can only exist at the start of the file so it ends up incorrectly positioning its cursor 3 bytes before the end of the line instead of at the beginning of the next line, hence the extra '}'.

Now, as to why the extra bytes get into the file in the first place, it turns out that due to a quirk in the StreamWriter implementation (https://referencesource.microsoft.com/#mscorlib/system/io/streamwriter.cs,208) that only skips writing the encoding preamble when appending to a stream if the stream is seekable, when you set a fileSizeLimitBytes on the FileSink the preamble gets written again.

When using the DurableSeqSink, it uses one file per day by default so we get the error as manytimes as the process starts on a given day.

I have created some unit tests to demonstrate the problem (pull request coming) and I can think of several ways to work around the problem, but ultimately I think that coordinated changes need to be made to both the Serilog.Sinks.File.FileSink and Serilog.Sinks.Seq.HttpLogShipper.

The fix closest to the source of the problem is to make the WriteCountingStream seekable or to use an encoding that does not include a preamble (don't use the Encoding.UTF8 static member). The second option is not viable here because of the way that the HttpLogShipper works as it assumes the file is written using UTF8 with a preamble and it will fail in a different way if it is not. I think that continuing to have the HttpLogShipper do its own stream position management instead of using a StreamReader is a timebomb. I don't actually understand why it is tracking it's own stream position in the first place.

@nblumhardt
Copy link
Member

Hi Eamon, thanks for the thorough investigation and report. Wow! I'm really surprised by the StreamWriter behaviour here. Nice find :-)

I agree that making the wrapper appear to be seekable is the way to go - the log shipper needs to muck around with positioning because it may read from the file with partial data for a line present. I'll take a look back at that code and see if there's a more robust approach possible.

I'll run through the workaround PR now, much appreciated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants