You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This has taken some time to investigate, and seems that the problem is due to a confluence of issues across separate Seq/serilog libraries
The problem description:
We are using serilog to log to Seq using the DurableSeqSink with a file size limit set. For a particular use case we have implemented an web endpoint that has the same api as Seq raw events endpoint.
What we were finding is that the data being posted often had an extra "}," (close brace and comma) like this:
{
"Events": [{
"Timestamp": "2015-05-09T22:09:08.12345+10:00",
"Level": "Warning",
"MessageTemplate": "Disk space is low on {Drive}",
"Properties": {
"Drive": "C:",
"MachineName": "nblumhardt-rmbp"
}
},
}
]
}
which meant that the payload was not valid json and so would fail to deserialize and return an error.
Eventually I determined that the buffer file for the serilog seq logger had additional bytes (invisible in a normal text editior). These were the bytes "ef bb bf" which happens to be the UTF8 BOM.
Again, while not ideal, this would not have caused the problem but for the fact that the HttpLogShipper does it's own byte counting instead of just relying on StreamReader (which seems to handle the extra BOM part way through a file) and it assumes that the BOM can only exist at the start of the file so it ends up incorrectly positioning its cursor 3 bytes before the end of the line instead of at the beginning of the next line, hence the extra '}'.
Now, as to why the extra bytes get into the file in the first place, it turns out that due to a quirk in the StreamWriter implementation (https://referencesource.microsoft.com/#mscorlib/system/io/streamwriter.cs,208) that only skips writing the encoding preamble when appending to a stream if the stream is seekable, when you set a fileSizeLimitBytes on the FileSink the preamble gets written again.
When using the DurableSeqSink, it uses one file per day by default so we get the error as manytimes as the process starts on a given day.
I have created some unit tests to demonstrate the problem (pull request coming) and I can think of several ways to work around the problem, but ultimately I think that coordinated changes need to be made to both the Serilog.Sinks.File.FileSink and Serilog.Sinks.Seq.HttpLogShipper.
The fix closest to the source of the problem is to make the WriteCountingStream seekable or to use an encoding that does not include a preamble (don't use the Encoding.UTF8 static member). The second option is not viable here because of the way that the HttpLogShipper works as it assumes the file is written using UTF8 with a preamble and it will fail in a different way if it is not. I think that continuing to have the HttpLogShipper do its own stream position management instead of using a StreamReader is a timebomb. I don't actually understand why it is tracking it's own stream position in the first place.
The text was updated successfully, but these errors were encountered:
Hi Eamon, thanks for the thorough investigation and report. Wow! I'm really surprised by the StreamWriter behaviour here. Nice find :-)
I agree that making the wrapper appear to be seekable is the way to go - the log shipper needs to muck around with positioning because it may read from the file with partial data for a line present. I'll take a look back at that code and see if there's a more robust approach possible.
I'll run through the workaround PR now, much appreciated.
This has taken some time to investigate, and seems that the problem is due to a confluence of issues across separate Seq/serilog libraries
The problem description:
We are using serilog to log to Seq using the DurableSeqSink with a file size limit set. For a particular use case we have implemented an web endpoint that has the same api as Seq raw events endpoint.
What we were finding is that the data being posted often had an extra "}," (close brace and comma) like this:
{
"Events": [{
"Timestamp": "2015-05-09T22:09:08.12345+10:00",
"Level": "Warning",
"MessageTemplate": "Disk space is low on {Drive}",
"Properties": {
"Drive": "C:",
"MachineName": "nblumhardt-rmbp"
}
},
}
]
}
which meant that the payload was not valid json and so would fail to deserialize and return an error.
Eventually I determined that the buffer file for the serilog seq logger had additional bytes (invisible in a normal text editior). These were the bytes "ef bb bf" which happens to be the UTF8 BOM.
Again, while not ideal, this would not have caused the problem but for the fact that the HttpLogShipper does it's own byte counting instead of just relying on StreamReader (which seems to handle the extra BOM part way through a file) and it assumes that the BOM can only exist at the start of the file so it ends up incorrectly positioning its cursor 3 bytes before the end of the line instead of at the beginning of the next line, hence the extra '}'.
Now, as to why the extra bytes get into the file in the first place, it turns out that due to a quirk in the StreamWriter implementation (https://referencesource.microsoft.com/#mscorlib/system/io/streamwriter.cs,208) that only skips writing the encoding preamble when appending to a stream if the stream is seekable, when you set a fileSizeLimitBytes on the FileSink the preamble gets written again.
When using the DurableSeqSink, it uses one file per day by default so we get the error as manytimes as the process starts on a given day.
I have created some unit tests to demonstrate the problem (pull request coming) and I can think of several ways to work around the problem, but ultimately I think that coordinated changes need to be made to both the Serilog.Sinks.File.FileSink and Serilog.Sinks.Seq.HttpLogShipper.
The fix closest to the source of the problem is to make the WriteCountingStream seekable or to use an encoding that does not include a preamble (don't use the Encoding.UTF8 static member). The second option is not viable here because of the way that the HttpLogShipper works as it assumes the file is written using UTF8 with a preamble and it will fail in a different way if it is not. I think that continuing to have the HttpLogShipper do its own stream position management instead of using a StreamReader is a timebomb. I don't actually understand why it is tracking it's own stream position in the first place.
The text was updated successfully, but these errors were encountered: