-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CsvLayout - Performance&docs improvements, added Quoting also on Column. #2934
Conversation
Codecov Report
@@ Coverage Diff @@
## dev #2934 +/- ##
======================================
- Coverage 80% 80% -<1%
======================================
Files 331 330 -1
Lines 25610 25596 -14
Branches 3311 3305 -6
======================================
- Hits 20528 20510 -18
+ Misses 4153 4151 -2
- Partials 929 935 +6 |
I think merge after confirmation from #2933? |
Not sure if you have read my comments. There is no fix. This is just a small bandaid
|
I saw the comment. I know it isn't a fix, but maybe we need - dunno therefore the question;) - if this change is significant (in terms of memory usage) |
Well the Layout-logic caches their async-output on the LogEventInfo. The LogEventInfo are stuffed into the AsyncWrapper-queue, and stays there until written. If the logger is fast enough, then it fill the queue and consume all memory. Not sure how to fix that. |
Made some performance tests, so not just optimizing in the blind:
One can see that the PR introduces an increase in CPU-Time because the checking of quotes now scans a StringBuilder instead of the allocated string (StringBuilder IndexOf is expensive). But because allocations are reduced, then it almost become even again. There is a small optimization if using Quoting=All, then the quote-scan is faster in this PR:
|
that's difficult too choose - memory vs cpu. is 23.980,0 MB = ~23 GB? Nice machine if you don't have memory swap issues. Another option would be to check the "symbols" while writing (in worst case you need a more expensive insert). |
Not difficult for me. It is only when logging large message of 153600 bytes (or larger) that you see the CPU-Time issue. And in that case you want the reduced allocation. Try looking at the numbers for Size = 150 bytes (same performance and less allocation).
The garbage collector is just doing its job, and cleaning up. It is total allocation not total memory usage.
That would require injection into all LayoutRenderers. Not sure you want to go that way. |
What am I missing that we can't use stringbuilder.insert? We are rendering each cell fully isn't? We have now: if (useQuoting)
{
sb.Append(columnValue.Replace(QuoteChar, _doubleQuoteChar));
}
else
{
sb.Append(columnValue);
} and that will be something like this? if (useQuoting)
{
var startIndex = sb.Length;
var startQuoteWritten = false;
for (var i = 0; i < columnValue.Length; i++)
{
var c = columnValue[i];
if (!startQuoteWritten && isQuoteChar(c))
{
sb.Insert(startIndex, QuoteChar);
startQuoteWritten = true;
}
sb.Append(c);
}
if (startQuoteWritten)
{
sb.Append(QuoteChar);
}
}
else
{
sb.Append(columnValue);
} PS: quickly written, needs a double check on index/length issues. (+1 or -1 too much) |
StringBuilder can only be reused if modifying it using Append or truncate by assigning Length. All other operations causes it to discard all internal buffers and reallocate. See also how I have done in #2907 |
What about this then? OK, worst case is worse. if (useQuoting)
{
var startIndex = sb.Length;
for (var i = 0; i < columnValue.Length; i++)
{
var c = columnValue[i];
if (charNeedsEscape(c))
{
//discard, we can't use insert
sb.Length = startIndex;
//render agian
sb.Append(QuoteChar);
sb.Append(columnValue);
sb.Append(QuoteChar);
}
else
{
sb.Append(c);
}
}
}
else
{
sb.Append(columnValue);
} |
Also remember that we don't have a columnValue. You have to render it first. And you render into a StringBuilder. The old code then performs a StringBuilder.ToString(), but the new code skips that allocation (And performs the expensive IndexOf directly on StringBuilder). I guess one could have special logic for the message-layoutrenderer so it returns the LogEventInfo.FormatttedMessage / Message (If Raw). But then you would have to recognize layout-wrappers like toupper/truncate/etc. (Ugly code for handling a weird special case). Btw. the updated code you have suggested is exactly how I have implemented to logic for quote-handling (Just without the columnValue-allocation unless quote-replace is needed) |
I guess if NLog didn't use StringBuilder directly but instead TextWriter, then it would be easier to recognize needed transformations on the fly, instead of scanning the StringBuilder. But that would require a heavy rewrite. |
huh? What's this then? Is it too late today? ;) https://github.com/NLog/NLog/pull/2934/files#diff-422e60153862f43b4bc5eb4941308f02R208 edit: probably only in worst case in this PR |
My code:
Is pretty much identical to your suggested code:
Not sure I understand what you are saying. But yes I have a |
One could consider adding a Then one can improve performance without forcing quotes for all columns. One can then tweak columns where quote-checking is not necessary (Nothing + All is much faster than Auto). Ex: <layout xsi:type="CsvLayout">
<withHeader>true</withHeader>
<column name="Date" layout="${date:universalTime=True:format=o}" quoting="Nothing" />
<column name="Server" layout="${machinename}" />
<column name="Logger" layout="${logger}" />
<column name="Source" layout="${callsite}" />
<column name="Level" layout="${level}" quoting="Nothing" />
<column name="Message" layout="${message}" quoting="All" />
<column name="Exception" layout="${exception:format=ToString}" />
</layout> |
2f2259c
to
9deaca6
Compare
… skip checking quotes for LogEvent-Level)
9deaca6
to
900b5cc
Compare
Very nice! |
Thanks, merged! |
Updated docs: https://github.com/NLog/NLog/wiki/CsvLayout |
thanks! |
Add support for ThreadSafe-optimization
by including CsvHeaderLayout. Resolve issue reported in #2933