Skip to content

Conversation

@ywkaras
Copy link
Contributor

@ywkaras ywkaras commented Jan 27, 2021

Also adds OneWriterMultiReader.h, higher-performance alternative to std::shared_mutex (also with logic to prevent writer starvation).

Setting proxy.config.diags.debug.enabled to 1 or 3 enables output for Debug macros with DbgCtl as first parameter or TSFDbg if the tag for the DbgClt/TSFDbCtl matches debug.tags regular expression. (A value of 1 still only enables Debug/TSDebug macros with a C-string tag as the first parameter.)

@ywkaras ywkaras self-assigned this Jan 27, 2021
@ywkaras ywkaras force-pushed the FasterDiags2 branch 5 times, most recently from 1713dd1 to 3e152ea Compare January 27, 2021 16:53
@ywkaras ywkaras added this to the 10.0.0 milestone Jan 27, 2021
@ywkaras ywkaras added Core Debug Support for system debugging Performance labels Jan 27, 2021
@ywkaras ywkaras force-pushed the FasterDiags2 branch 3 times, most recently from df26b18 to c752e09 Compare January 30, 2021 01:14
@ywkaras ywkaras changed the title Add FDbg/TSFDbg macros for debug output, faster checking to see if enabled. Change Debug, add TSFDbg macros for debug output, faster checking to see if enabled. Jan 30, 2021
@bneradt
Copy link
Contributor

bneradt commented Jan 30, 2021

[approve ci autest]

1 similar comment
@bneradt
Copy link
Contributor

bneradt commented Jan 30, 2021

[approve ci autest]

@ywkaras ywkaras force-pushed the FasterDiags2 branch 4 times, most recently from e4a243f to a10c743 Compare February 5, 2021 23:58
@ywkaras
Copy link
Contributor Author

ywkaras commented Feb 8, 2021

I did some performance testing, here are the details: https://gist.github.com/ywkaras/4b0386341a4a6ce0afacaf46dcce79d6 .

In my test scenario, enabling debug output (with a regex that matches no tag) reduces requests per second by about 10% with this change, versus 95% without this change.

@ywkaras
Copy link
Contributor Author

ywkaras commented Feb 8, 2021

The Clang-Agonizer error is in DNS.cc, which this PR does not change.

@ywkaras
Copy link
Contributor Author

ywkaras commented Feb 9, 2021

Merging this with #7279 will be a bit nasty.

@@ -0,0 +1,225 @@
/** @file

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason this PR rolls its own reader/writer lock rather than using one of the standard implementations?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I didn't carefully read your initial comment. What is the performance hit of using the std implementation of a reader_writer lock?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the scenario I tested, I saw more than double the requests per second using my shared mutex implementation.

But, in truth, if we go with the second option (where each Debug() call has an instance of DbgCtl, even if it has to create it itself), it will be moot, the lock will almost never need to be locked.

Aside from better performance, my shared mutex has logic to prevent writer starvation. But it's not recursive.

It was actually a mistake that I implemented my own shared mutex, I didn't realize pthreads (and C++17) had one. But looking like a lucky accident.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing I couldn't understand in my performance testing is, no matter how I tweaked the parameters of h2load, I couldn't get under 65% idle on atslab00. All the requests were the same, the (cleartext) GET of a single object in cache, so maybe ATS was disk I/O bound?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An important point to keep in mind is that, in my testing, I used a tag regex that didn't match any tag. So the 10% drop in throughput did not include the overhead of any actual output. Running in production with a regex like "http|dns" would result in a much bigger drop in throughput than 10%. Debugging could only be turned on in production with a regex that enabled a small amount of debug output.

@shinrich
Copy link
Member

Looks like a good approach. The performance numbers are encouraging.

@ywkaras ywkaras added the TS API label Feb 11, 2021
@ywkaras ywkaras force-pushed the FasterDiags2 branch 2 times, most recently from 02487f8 to ed1394b Compare March 16, 2021 16:41
@ywkaras ywkaras closed this Mar 16, 2021
@ywkaras ywkaras reopened this Mar 16, 2021
@ywkaras ywkaras force-pushed the FasterDiags2 branch 2 times, most recently from c382f27 to 367ba22 Compare March 16, 2021 19:09
@ywkaras ywkaras changed the title Change Debug, add TSFDbg macros for debug output, faster checking to see if enabled. Reduce overhead incurred by enabling debug output. Apr 1, 2021
@ywkaras ywkaras force-pushed the FasterDiags2 branch 3 times, most recently from 8947086 to 521f98d Compare April 1, 2021 19:40
Setting proxy.config.diags.debug.enabled to 3 enables most core debug output (whose tag matches the debug
regex), and plugin debug output from the new API macro TSDbg().  Setting it to 1 enables all (matching) debug
output, but with the previous high overhead.

Note that enabling a large qualtity of debug output will still have a big impact on performance.
@ywkaras ywkaras closed this Apr 3, 2021
@ywkaras ywkaras removed this from the 10.0.0 milestone Apr 3, 2021
@ywkaras ywkaras removed Core Debug Support for system debugging Performance TS API labels Apr 3, 2021
@ywkaras ywkaras removed their assignment Apr 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants