-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
stats: add symbol-table contention logging and admin endpoint #8035
Comments
I think this should block #4980 . |
I had a variation on my this idea above which is to check our contention atomic (common/common/mutex_tracer_impl.h) and log a warning if there is actual contention on a particular symbol. This has the advantage of being relatively simple to implement and not at all noisy. The disadvantage is that it will not flag usage of the global lock at all; only if it actually contends in practice, so it will not flag potential problems in advance that may arise as scale (# threads, traffic) increases. The other options for logging may be moderately complex to implement (a small simplified lru-cache of looked-up names) or somewhat noisy (trace-log all lookups). The simplest option might be just a deque containing the most recent (say) 10 lookups, which may be all of the same name. For the simple deque or the lru-cache, the operational implication is that we'd have to check the logs or the admin endpoint on running systems and note the names that were missed. |
As an aside, I was looking at the |
You mean via the "opaque name for contented mutex" here? Have you figured out how to map that back to the FWIW I found the contention metric useful in the process of symbolizing the stats access for http codes, because I could run a load-test with siege, with and without a proposed PR, to determine how that PR affected contention. In any case I'm leaning away from relying on dynamic contention for this issue, and toward just keeping bounded queues of dynamically accessed StatNames and Symbols, allowing repeats and dropping old ones, for simplicity of implementation and speed. We can then collate duplicates in the admin handler and in log statements. I expect this queue to be rapidly turn over during startup, but should be quiescent during operation (except perhaps in an xDS update). So in operation if we just check the queue it should ideally be filled with old requests from startup. |
I didn't look into the mutex tracer code at all, but yeah, was hoping that we could print out a per-mutex map, at least for certain named mutexes, but I'm not sure what is possible. |
It's possible but involves keeping a bunch of maps up-to-date including when mutexes are destructed, I'm leaning against taking this on for two reasons:
So if it seems reasonable to you I will pursue the deque approach. |
Sure sounds good (I still think it would be useful to think about how to make the contention output more useful, but that's a different issue I agree). |
Per discussion in #7842 it would be useful to be able to see symbol-table contention by name in logs and an admin endpoint.
The text was updated successfully, but these errors were encountered: