Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Composite chart display name is wrong when we have multiple chart instances per node #426

Closed
cakrit opened this issue May 16, 2022 · 16 comments

Comments

@cakrit
Copy link
Contributor

cakrit commented May 16, 2022

Bug description

When we have multiple chart id/names for the same context in a single node, we can't use any of the display names for any of the charts, as they contain information identifying the instance. We should find a way to show a display name at the context level. @ilyam8 should be able to help.

Expected behavior

Don't show instance information in the composite chart's display name, when it includes multiple instances. Better to not show a display name at all, until we can show a proper one.

Steps to reproduce

  1. Scroll down to any chart with multiple instances per node (usually disks, mountpoints, network interfaces).
  2. Note the display name to the left of the context and units.

Screenshots

image

Error Logs

No response

Desktop

OS: [e.g. iOS]
Browser [e.g. chrome, safari]
Browser Version [e.g. 22]

Additional context

No response

@cakrit cakrit added bug Something isn't working needs triage labels May 16, 2022
@ilyam8
Copy link
Member

ilyam8 commented May 16, 2022

@cakrit I think we just shouldn't show a chart id (net.vpn0 on the screenshot) on composite charts. Why do we need it?

@cakrit
Copy link
Contributor Author

cakrit commented May 17, 2022

Of course we shouldn't show chart id. What I don't know is if there's a simple rule (e.g. remove everything within paretheses) that will fix the display name. I think I have seen display names that also contain an instance identifier, but I may be wrong. Definitely the quickest win is to just strip out " (.*)"

@ilyam8
Copy link
Member

ilyam8 commented May 17, 2022

Probably I am missing something:

  • A chart id must be globally unique (within a Netdata instance).
  • It means that usually, ids have some instance-specific data (disk, mount point, network interface).
  • The value in () on the overview page is, I guess, the first chart id from the aggregated group.
  • It never makes sense to show () (using the same pattern, the value is a chart id) on the overview page.

Definitely the quickest win is to just strip out " (.*)"

Yes.


Probably I am missing something

Unless we want to put in () some other data - removing (*) is an obvious thing to do to my understanding.

@hugovalente-pm
Copy link
Contributor

this is linked to #253 and https://github.com/netdata/product/issues/2052

@ilyam8 's suggestion works for most of the cases, but we have the isuse of fping that wouldn't be sorted by that.
@papazach these chart titles are provided by Cloud BE to FE, right?

@ilyam8
Copy link
Member

ilyam8 commented May 17, 2022

but we have the isuse of fping that wouldn't be sorted by that

I read the linked issue but didn't understand what exactly you mean @hugovalente-pm. Can you explain?

The fping chart titles (e.g. "Fping Latency for X" => "Fping Latency") issue fix is schweikert/fping#253

@hugovalente-pm
Copy link
Contributor

ok, didn't know the hostname is to be removed from fping charts titles. that for sure solves the issue on the title of these

there is a further issue with fping then on section names, because sections are also showing a hostname and on Cloud Overview page this is "generic", the hostname specific charts can be seen if you change "Group by" to instance
image

@hugovalente-pm
Copy link
Contributor

@TonyPath @juacker @papazach this needs to be tackled from BE, because it is BE that provides these to FE
can we try to do something like @ilyam8 was suggesting here?

Another option could probably be doing something similar to the mutations we do for disks, network interfaces, mount points families, but this would be a step back on what we want to achieve - Expose more details on Disks, Mount Points, Network Interfaces and other sections where we group under all #Expose more details on Disks, Mount Points, Network Interfaces and other sections where we group under all #275

@cakrit
Copy link
Contributor Author

cakrit commented Jun 1, 2022

but we have the isuse of fping that wouldn't be sorted by that

I read the linked issue but didn't understand what exactly you mean @hugovalente-pm. Can you explain?

The fping chart titles (e.g. "Fping Latency for X" => "Fping Latency") issue fix is schweikert/fping#253

No, we can't wait for the source code to be fixed and the new version of fping to be deployed to the entire world for the title to be fixed. Please strip it with a regular expression or something, we need to problem corrected ASAP.

@ilyam8
Copy link
Member

ilyam8 commented Jun 1, 2022

I don't know how to do it, we just exec fping. @netdata/agent-sre can you help us? Asking you because you are much more experienced with bash.

@cakrit
Copy link
Contributor Author

cakrit commented Jun 1, 2022

I don't know how to do it, we just exec fping. @netdata/agent-sre can you help us? Asking you because you are much more experienced with bash.

Wow, I hadn't realized that fping works as a Netdata plugin itself. I thought we were just processing its results! If the maintainer isn't responding, we should fork and source it.

@ilyam8
Copy link
Member

ilyam8 commented Jun 1, 2022

Yep, fping has -N flag 😄

−N, −−netdata
Format output for netdata (−l −Q are required). See: http://my−netdata.io/


btw we need to change the link to https://www.netdata.cloud/

@cakrit
Copy link
Contributor Author

cakrit commented Jun 1, 2022

To get back to how to handle this issue though, the point is to not go through the rabbit hole of the issues linked to #253
I'm not talking about making the Overview more useful/useable, but ONLY about removing the completely wrong names linked to each context, by taking out what's in the parentheses and then also going through the contexts where multiple instances appear as different sections (e.g. httpcheck, fping, apache, bind etc.) to ensure that nothing else contains weird identifiers in the title.

If it's a matter of a couple of weeks to completely resolve the issue and have a brand new version where all the cases are handled well and the filtering is available, I suppose we can wait. If not, we should provide a patch ASAP.

@juacker
Copy link

juacker commented Jun 1, 2022

@cakrit I'm trying to reproduce it but can't find a spaceroom where this happens, could you send me a link to a spaceroom to investigate it?

image

@hugovalente-pm
Copy link
Contributor

@juacker was trying to find this also on our PROD space but seems it was fixed already

@cakrit
Copy link
Contributor Author

cakrit commented Jun 3, 2022

Here you go:

image

@cakrit
Copy link
Contributor Author

cakrit commented Jun 3, 2022

Checking the others, it looks like httpcheck does NOT have this issue, as the chart titles don't have the name
So it's really just fping that we have an issue with. Closing this one, @ilyam8 let's continue in netdata/netdata#13068

@cakrit cakrit closed this as completed Jun 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants