-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proxy: memory bloat caused by fmt.Sprintf #6198
Comments
@fpetkovski was the Querier in debug logging level? The memory profile in the first picture shows to me that most of the memory used by Line 276 in 6e28411
|
Yea, good finding! We should definitely ONLY append store debug msgs when level debug is enabled. I wished logger gave us this info 🙃 However, i don't expect it to affect bigger queries, it's only for NOP queries it's significant CPU overhead, which is kind of fine - it's not hot path then. Still low hanging fruit. 👍🏽 |
Yeah I think this is because we generate debug messages even without debug mode. We use alerting scoped per tenant so we have lots of excluded stores for alerts. |
An option might be also to change |
Is printing of the labelsets useful for debugging purposes, or is it sufficient to know the store addresses? If it's enough to know the store addresses, perhaps the easiest fix is to remove the printing of the labelsets as @douglascamata suggests above. It is probably quite a bit of work to conditionally generate the debug messages, given we can't get the current log level easily from go-kit/log. |
I think to know why a store has been filtered out, labels can be useful. On the other hand, for the debug message about start of fan out, I think address might suffice? In that case, we could move gathering the stores info after |
@matej-g but realistically speaking, did you ever find yourself using that information? I'm just trying to get us to think whether we believe it's useful or we know it's useful. Even looking at the size of the debug log lines that this will generate, it's a lot of work to just read through it... |
If you want my personal empirical experience, it was useful for me 😄, there were couple of times where I could not understand why exactly the store was filtered out and found it only after looking at labels. That being said, this might not be universal for all users. It's not a big deal to drop it and only include address and use that to lookup labels. I was just thinking if we could retain that info and sidestep this performance issue at the same time, but either solution is fine with me 🙂. |
@matej-g I see. As mentioned by others, if we could check the log level there could be a "base log" at info level without the labels and at debug level it could include labels. |
I think we should not generate debug messages at all if debug log level is not enabled. We can propagate this as a separate flag that is as |
In debug mode we log information about which endpoints we skip and query, including the labels of each endpoint. These debug messages, can use significant memory, especially in larger setups with larger number of labelsets. Previously we would generate the debug messages, even when not in debug mode. With commit, we only generate the debug messages when debug logging is enabled. This fixes: thanos-io#6198 Signed-off-by: Jacob Baungard Hansen <jachanse@redhat.com>
In debug mode we log information about which endpoints we skip and query, including the labels of each endpoint. These debug messages, can use significant memory, especially in larger setups with larger number of labelsets. Previously we would generate the debug messages, even when not in debug mode. With commit, we only generate the debug messages when debug logging is enabled. This fixes: thanos-io#6198 Signed-off-by: Jacob Baungard Hansen <jachanse@redhat.com>
In debug mode we log information about which endpoints we skip and query, including the labels of each endpoint. These debug messages, can use significant memory, especially in larger setups with larger number of labelsets. Previously we would generate the debug messages, even when not in debug mode. With commit, we only generate the debug messages when debug logging is enabled. This fixes: thanos-io#6198 Signed-off-by: Jacob Baungard Hansen <jachanse@redhat.com>
In debug mode we log information about which endpoints we skip and query, including the labels of each endpoint. These debug messages, can use significant memory, especially in larger setups with larger number of labelsets. Previously we would generate the debug messages, even when not in debug mode. With commit, we only generate the debug messages when debug logging is enabled. This fixes: thanos-io#6198 Signed-off-by: Jacob Baungard Hansen <jachanse@redhat.com>
In debug mode we log information about which endpoints we skip and query, including the labels of each endpoint. These debug messages, can use significant memory, especially in larger setups with larger number of labelsets. Previously we would generate the debug messages, even when not in debug mode. With commit, we only generate the debug messages when debug logging is enabled. This fixes: thanos-io#6198 Signed-off-by: Jacob Baungard Hansen <jachanse@redhat.com>
I attempted to create a fix for this. I didn't manage to generate a profile similar to the look of yours though @fpetkovski. If you have an environment where it is easy to do so, perhaps it would be worth to just sanity check that the PR actually solves the memory bloat issue reported here. |
Hi Jacob, thanks for the PR 👍 I will deploy this today or tomorrow and take another profile. |
In debug mode we log information about which endpoints we skip and query, including the labels of each endpoint. These debug messages, can use significant memory, especially in larger setups with larger number of labelsets. Previously we would generate the debug messages, even when not in debug mode. With commit, we only generate the debug messages when debug logging is enabled. This fixes: thanos-io#6198 Signed-off-by: Jacob Baungard Hansen <jachanse@redhat.com>
I was looking at a querier profile and noticed that
fmt.Sprintf
inside the proxy uses 15% of total querier memory. This querier instance is handling alerting rulers and has a high RPS.It's probably worth replacing dynamic messages with constant errors.
The text was updated successfully, but these errors were encountered: