Skip to content

Conversation

@ywkaras
Copy link
Contributor

@ywkaras ywkaras commented Jan 5, 2022

Setting proxy.config.diags.debug.enabled to 3 enables most core debug output (whose tag matches the debug
regex), and plugin debug output from the new API macro TSDbg(). Setting it to 1 enables all (matching) debug
output, but with the previous high overhead.

Note that enabling a large qualtity of debug output will still have a big impact on performance.

This commit could be merge into proprietary branches of trafficserver, without causing routine
merge conflicts when aligning with the open source branch.

@ywkaras ywkaras self-assigned this Jan 5, 2022
@ywkaras
Copy link
Contributor Author

ywkaras commented Jan 5, 2022

This replaces: #7674

@ywkaras
Copy link
Contributor Author

ywkaras commented Jan 13, 2022

Here is the result of performance testing in Yahoo prod of this PR: https://gist.github.com/ywkaras/ed6ebc39518d88f9b0d3ee007d0573de

It shows that, with this change, debug output can be enabled globally with no loss of performance.

@ywkaras
Copy link
Contributor Author

ywkaras commented Jan 13, 2022

@SolidWallOfCode I'll give you the Grigor pony if Linkedin will merge this into their internal ATS. On a darker note, I could also burn Box Alan if you don't keep me happy.

Setting proxy.config.diags.debug.enabled to 3 enables most core debug output (whose tag matches the debug
regex), and plugin debug output from the new API macro TSDbg().  Setting it to 1 enables all (matching) debug
output, but with the previous high overhead.

Note that enabling a large qualtity of debug output will still have a big impact on performance.

This commit could be merge into proprietary branches of trafficserver, without causing routine
merge conflicts when aligning with the open source branch.
@apache apache deleted a comment from ezelkow1 Mar 23, 2022
@ywkaras
Copy link
Contributor Author

ywkaras commented Apr 11, 2022

[approve ci autest]

@ywkaras ywkaras marked this pull request as ready for review April 11, 2022 22:15
@ywkaras
Copy link
Contributor Author

ywkaras commented Apr 11, 2022

This change is deployed in Yahoo Prod, no problems seen, has been found to be very helpful.

@bryancall
Copy link
Contributor

@cmcfarlen is going to take a look at this.

#include <tscore/Diags.h>

// The resistry of fast debug controllers has a ugly implementation to handle the whole-program initialization
// order problem with C++.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain how this handles the whole-program init order problem? Also, a general summary of the approach here how the debug tags are assembled into this registry and efficiently checked.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The registry data is "naturally" just a static member variable. But the registry may be accessed before initialization for the .cc file defining the registry data. So the data is instead defined as a static local in a static function that returns a referent as to the data. This is an ugly but standard solution to the order of initialization problem.

Under the old design, each "Debug()" call stores the debug tag. When debug was globally enabled, the tag was tested for a match with the configured PCRE for enabled tags. If it matched, the output was generated. But, the test for a match required significant computation, causing a big degradation in ATS performance.

Under the new design, each "Debug()" call stores a pointer into entry in the registry set for the tag. A set entry contains the tag string and an on/off flag for the tag. So each Debug call only has to dereference the pointer to the set entry, and then check the on/off flag. When the PCRE for enabled tags is changed, the registry set is iterated over, and only then is each tag checked for a match to the PCRE. The on/off flag for the tag is then set based on the check result.

#define Debug(tag__, ...) \
do { \
if (unlikely(diags()->on())) { \
static DbgCtl ctl__(tag__); \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

identifiers with double underscores are reserved for the implementation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK gone.

// Tag is a debug tag. Debug output associated with this control will be output when debug output
// is enabled globally, and the tag matches the configured debug tag regular expression.
//
DbgCtl(char const *tag) : _ptr(_get_ptr(tag)) {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could call TSDbgCtlCreate instead of _get_ptr directly to get the asserts

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean checking that tag is not null? I don't think we normally call TS API functions from core TS.

ink_assert(sz > 0);

{
char *t = new char[sz + 1]; // Deleted implicitly by program termination.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔Not sure a memcheck tool would agree.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, it's likely to be flagged.

Copy link
Contributor

@cmcfarlen cmcfarlen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR does dramatically increase performance of debug tracing. The way the tags are memoized leaves a lot of static variables all over the code (one static pointer per debug statement, one entry per tag in a global map). This is a time vs space trade-off, but doesn't add additional locking after the first time the statement is executed. I tested this without the volatile declaration on the tag struct and it still works fine so I recommend dropping that.

} TSHttpConnectOptions;

typedef struct TSDbgCtl_s {
char volatile on; // Flag
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way this volatile declaration is used is undefined behavior. See usage for volatile after C++11. particularly:

This makes volatile objects suitable for communication with a signal handler, but not with another thread of execution, see std::memory_order). Any attempt to refer to a volatile object through a glvalue of non-volatile type (e.g. through a reference or pointer to non-volatile type) results in undefined behavior.

If you want to guarantee that changes are seen immediately in another thread, you will need to use some memory ordering mechanism. Otherwise, other threads will see changes the next time that memory is read.

I tested this without volatile and it still adjusts the debug that are printed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reality is that ATS often relies on de facto portability, rather than portability guaranteed by the Standard.

I guess memory ordering has been added to straight C, but I don't know the specifics, I'll have to research this. The point of the volatile is to emphasize that that the on flag should not be register cached. But it seems implausible the compiler could do that indefinitely anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's correct that (when using Debug() instead of Dbg() ) there's a significant increase in loaded data segment size, But, it's a delta on top of the high resource use of trace output even with the old design. Which seems worthwhile, to make tracing usable in production.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree this is a worthwhile trade off for performant debugging! It's good stuff.

I also think that in this case, any changes to on values will be seen "soon enough" by other threads and that volatile just isn't necessary here(especially since its accessed through a non-volatile pointer).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although it's unlikely, this code illustrates the sort of optimization that could happen to a Debug statement in a loop, if we don't do something to prevent the compiler from register-caching the 'on' flag: https://godbolt.org/z/r4K9TP1qW

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The alternate to simply relying on volatile would be to use atomic_load_explicit() with relaxed memory order. But that would mean we'd have to require plugins to be build with C11 support, which I don't think we've done up till now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that the Standard says that the behavior of reinterpret_cast is undefined, but ATS uses it over 1,400 times.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is closer to (but still not quite) the situation we have here. We are accessing a volatile through a (non-volatile) static pointer. The compiler must read from memory at least one time when the containing function is called. For the non-volatile case, the compiler can then assume the value won't change until another function is called or the original function is called again. This is what I meant by it will be re-read "soon enough".

All that said, if you are happier with the volatile qualifier and no-one else objects, I'm not overly concerned about it.

@ywkaras ywkaras merged commit 1c038e5 into apache:master Apr 25, 2022
@zwoop zwoop added this to the 10.0.0 milestone Jan 17, 2023
@traeak traeak mentioned this pull request May 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Debug Support for system debugging Performance TS API

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants