Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CsvLayout - Performance&docs improvements, added Quoting also on Column. #2934

Merged
merged 4 commits into from
Oct 6, 2018

Conversation

snakefoot
Copy link
Contributor

@snakefoot snakefoot commented Sep 24, 2018

Add support for ThreadSafe-optimization

by including CsvHeaderLayout. Resolve issue reported in #2933

@codecov
Copy link

codecov bot commented Sep 24, 2018

Codecov Report

Merging #2934 into dev will decrease coverage by <1%.
The diff coverage is 91%.

@@          Coverage Diff           @@
##             dev   #2934    +/-   ##
======================================
- Coverage     80%     80%   -<1%     
======================================
  Files        331     330     -1     
  Lines      25610   25596    -14     
  Branches    3311    3305     -6     
======================================
- Hits       20528   20510    -18     
+ Misses      4153    4151     -2     
- Partials     929     935     +6

@304NotModified
Copy link
Member

I think merge after confirmation from #2933?

@snakefoot
Copy link
Contributor Author

snakefoot commented Sep 25, 2018 via email

@304NotModified
Copy link
Member

304NotModified commented Sep 25, 2018

I saw the comment. I know it isn't a fix, but maybe we need - dunno therefore the question;) - if this change is significant (in terms of memory usage)

@snakefoot
Copy link
Contributor Author

snakefoot commented Sep 25, 2018

I know it isn't a fix, but maybe we need

Well the Layout-logic caches their async-output on the LogEventInfo. The LogEventInfo are stuffed into the AsyncWrapper-queue, and stays there until written.

If the logger is fast enough, then it fill the queue and consume all memory. Not sure how to fix that.

@snakefoot
Copy link
Contributor Author

snakefoot commented Sep 26, 2018

Made some performance tests, so not just optimizing in the blind:

Logger Name Messages Size Args Threads Loggers
Csv Async 1.000 4194304 0 1 1
Test Name Time (ms) Msgs/sec GC2 GC1 GC0 CPU (ms) Alloc (MB)
Master 29.286 34 257 257 259 51.500 23.980,0
This PR 39.096 25 124 124 124 58.578 7.993,7
Logger Name Messages Size Args Threads Loggers
Csv Sync 1.000 4194304 0 1 1
Test Name Time (ms) Msgs/sec GC2 GC1 GC0 CPU (ms) Alloc (MB)
Master 37.642 26 225 225 227 41.531 15.986,7
This PR 43.351 23 0 0 0 43.281 0,2
Logger Name Messages Size Args Threads Loggers
Csv Async 10.000 153600 0 1 1
Test Name Time (ms) Msgs/sec GC2 GC1 GC0 CPU (ms) Alloc (MB)
Master 12.010 832 1316 1316 1341 18.609 8.816,2
This PR 14.337 697 539 539 539 16.687 2.939,9
Logger Name Messages Size Args Threads Loggers
Csv Sync 10.000 153600 0 1 1
Test Name Time (ms) Msgs/sec GC2 GC1 GC0 CPU (ms) Alloc (MB)
Master 13.931 717 1493 1493 1493 17.968 5.875,2
This PR 15.674 637 0 0 0 15.640 1,8
Logger Name Messages Size Args Threads Loggers
Csv Sync 500.000 150 0 1 1
Test Name Time (ms) Msgs/sec GC2 GC1 GC0 CPU (ms) Alloc (MB)
Master 2.746 182.021 0 0 65 2.734 724,9
This PR 2.752 181.674 0 0 2 2.765 85,9

One can see that the PR introduces an increase in CPU-Time because the checking of quotes now scans a StringBuilder instead of the allocated string (StringBuilder IndexOf is expensive). But because allocations are reduced, then it almost become even again. There is a small optimization if using Quoting=All, then the quote-scan is faster in this PR:

Logger Name Messages Size Args Threads Loggers
Csv Sync - Quoting=All 10.000 153600 0 1 1
Test Name Time (ms) Msgs/sec GC2 GC1 GC0 CPU (ms) Alloc (MB)
Master 11.740 851 1387 1387 1387 15.968 5.873,9
This PR 8.924 1.120 0 0 0 8.906 1,7

@304NotModified
Copy link
Member

that's difficult too choose - memory vs cpu. is 23.980,0 MB = ~23 GB? Nice machine if you don't have memory swap issues.

Another option would be to check the "symbols" while writing (in worst case you need a more expensive insert).

@snakefoot
Copy link
Contributor Author

snakefoot commented Sep 26, 2018

that's difficult too choose - memory vs cpu.

Not difficult for me. It is only when logging large message of 153600 bytes (or larger) that you see the CPU-Time issue. And in that case you want the reduced allocation. Try looking at the numbers for Size = 150 bytes (same performance and less allocation).

is 23.980,0 MB = ~23 GB? Nice machine if you don't have memory swap issues.

The garbage collector is just doing its job, and cleaning up. It is total allocation not total memory usage.

Another option would be to check the "symbols" while writing (in worst case you need a more expensive insert).

That would require injection into all LayoutRenderers. Not sure you want to go that way.

@304NotModified
Copy link
Member

304NotModified commented Sep 26, 2018

That would require injection into all LayoutRenderers. Not sure you want to go that way.

What am I missing that we can't use stringbuilder.insert? We are rendering each cell fully isn't?

We have now:

            if (useQuoting)
            {
                sb.Append(columnValue.Replace(QuoteChar, _doubleQuoteChar));
            }
            else
            {
                sb.Append(columnValue);
            }

and that will be something like this?

 if (useQuoting)
 {
     var startIndex = sb.Length;
     var startQuoteWritten = false;
     for (var i = 0; i < columnValue.Length; i++)
     {
         var c = columnValue[i];

         if (!startQuoteWritten && isQuoteChar(c))
         {
             sb.Insert(startIndex, QuoteChar);
             startQuoteWritten = true;
         }

         sb.Append(c);
     }

     if (startQuoteWritten)
     {
         sb.Append(QuoteChar);
     }
 }
 else
 {
     sb.Append(columnValue);
 }

PS: quickly written, needs a double check on index/length issues. (+1 or -1 too much)

@snakefoot
Copy link
Contributor Author

snakefoot commented Sep 26, 2018

What am I missing that we can't use stringbuilder.insert? We are rendering each cell fully isn't?

StringBuilder can only be reused if modifying it using Append or truncate by assigning Length. All other operations causes it to discard all internal buffers and reallocate. See also how I have done in #2907

@304NotModified
Copy link
Member

304NotModified commented Sep 26, 2018

What about this then? OK, worst case is worse.

if (useQuoting)
 {
     var startIndex = sb.Length;
     for (var i = 0; i < columnValue.Length; i++)
     {
         var c = columnValue[i];

         if (charNeedsEscape(c))
         {
             //discard, we can't use insert
             sb.Length = startIndex;
             //render agian
             sb.Append(QuoteChar);
             sb.Append(columnValue);
             sb.Append(QuoteChar);
         }
         else
         {
             sb.Append(c);
         }
     }
 }
 else
 {
     sb.Append(columnValue);
 }

@snakefoot
Copy link
Contributor Author

snakefoot commented Sep 26, 2018

Also remember that we don't have a columnValue. You have to render it first. And you render into a StringBuilder. The old code then performs a StringBuilder.ToString(), but the new code skips that allocation (And performs the expensive IndexOf directly on StringBuilder).

I guess one could have special logic for the message-layoutrenderer so it returns the LogEventInfo.FormatttedMessage / Message (If Raw). But then you would have to recognize layout-wrappers like toupper/truncate/etc. (Ugly code for handling a weird special case).

Btw. the updated code you have suggested is exactly how I have implemented to logic for quote-handling (Just without the columnValue-allocation unless quote-replace is needed)

@snakefoot
Copy link
Contributor Author

snakefoot commented Sep 26, 2018

I guess if NLog didn't use StringBuilder directly but instead TextWriter, then it would be easier to recognize needed transformations on the fly, instead of scanning the StringBuilder. But that would require a heavy rewrite.

@304NotModified
Copy link
Member

304NotModified commented Sep 26, 2018

Also remember that we don't have a columnValue

huh? What's this then? Is it too late today? ;) https://github.com/NLog/NLog/pull/2934/files#diff-422e60153862f43b4bc5eb4941308f02R208

edit: probably only in worst case in this PR

@snakefoot
Copy link
Contributor Author

snakefoot commented Sep 26, 2018

huh? What's this then? Is it too late today? ;)

My code:

         if (ColumnValueRequiresQuotes(target, orgLength))
         {
                string columnValue = target.ToString(orgLength, target.Length - orgLength);
                target.Length = orgLength;   // Unwind / Truncate
                target.Append(QuoteChar);
                target.Append(columnValue.Replace(QuoteChar, _doubleQuoteChar));
                target.Append(QuoteChar);
         }

Is pretty much identical to your suggested code:

         if (charNeedsEscape(c))
         {
             //discard, we can't use insert
             sb.Length = startIndex;
             //render agian
             sb.Append(QuoteChar);
             sb.Append(columnValue);
             sb.Append(QuoteChar);
         }

Not sure I understand what you are saying. But yes I have a columnValue, but it is only available when string-allocation is needed because of required quoting.

@snakefoot
Copy link
Contributor Author

snakefoot commented Oct 6, 2018

One could consider adding a Quoting-property on the CSV-Column. Then one could specify Quoting=All for the specific column with the ${message} (While the CsvLayout default is Quoting=Auto).

Then one can improve performance without forcing quotes for all columns. One can then tweak columns where quote-checking is not necessary (Nothing + All is much faster than Auto). Ex:

      <layout xsi:type="CsvLayout">
        <withHeader>true</withHeader>
        <column name="Date" layout="${date:universalTime=True:format=o}" quoting="Nothing" />
        <column name="Server" layout="${machinename}" />
        <column name="Logger" layout="${logger}" />
        <column name="Source" layout="${callsite}" />
        <column name="Level" layout="${level}" quoting="Nothing" />
        <column name="Message" layout="${message}" quoting="All" />
        <column name="Exception" layout="${exception:format=ToString}" />
      </layout>

@snakefoot snakefoot force-pushed the CsvLayoutThreadSafe branch from 2f2259c to 9deaca6 Compare October 6, 2018 19:15
@snakefoot snakefoot force-pushed the CsvLayoutThreadSafe branch from 9deaca6 to 900b5cc Compare October 6, 2018 19:18
@304NotModified
Copy link
Member

Very nice!

@304NotModified 304NotModified changed the title CsvLayout - Add support for ThreadSafe-optimization CsvLayout - Performance&docs improvements, added Quoting also on Column. Oct 6, 2018
@304NotModified 304NotModified merged commit 7cf6729 into NLog:dev Oct 6, 2018
@304NotModified
Copy link
Member

Thanks, merged!

@snakefoot
Copy link
Contributor Author

@304NotModified
Copy link
Member

thanks!

@304NotModified 304NotModified added the documentation done all docs done (wiki, api docs, lists on nlog-project.org, xmldocs) label Feb 22, 2019
@snakefoot snakefoot deleted the CsvLayoutThreadSafe branch April 4, 2020 13:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
csv-layout documentation done all docs done (wiki, api docs, lists on nlog-project.org, xmldocs) feature needs documentation on wiki performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants