Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Monitoring tab in Kibana only showing master nodes #44666

Closed
coudenysj opened this issue Sep 3, 2019 · 12 comments
Closed

Monitoring tab in Kibana only showing master nodes #44666

coudenysj opened this issue Sep 3, 2019 · 12 comments
Labels

Comments

@coudenysj
Copy link

Kibana version: 7.3.1

Elasticsearch version: 7.3.1

Server OS version: Debian 10

Original install method (e.g. download page, yum, from source, etc.): deb

Describe the bug:

The monitoring tab is only showing the master node details (but indicates there are more nodes in the cluster).

Steps to reproduce:

  1. Open https://master:5601/app/monitoring#/elasticsearch/nodes

Expected behavior:

We have another cluster running 6.5.2 which is showing all nodes in the cluster.

Screenshots (if relevant):

image

image

Errors in browser console (if relevant):

Refused to execute inline script because it violates the following Content Security Policy directive: "script-src 'unsafe-eval' 'nonce-xxx/v'". Either the 'unsafe-inline' keyword, a hash ('sha256-xxx), or a nonce ('nonce-...') is required to enable inline execution.

Provide logs and/or server output (if relevant):

Any additional context:

It looks exactly like https://discuss.elastic.co/t/monitoring-elasticsearch-nodes-no-data-nodes/167321, but all the queries requested there are showing all nodes.

@cjcenizal cjcenizal added Team:Monitoring Stack Monitoring team triage_needed labels Sep 3, 2019
@elasticmachine
Copy link
Contributor

Pinging @elastic/stack-monitoring

@chrisronline
Copy link
Contributor

Hey @coudenysj,

I think the first step is determining if the data is there or not. It's possible that the non master nodes aren't reporting for some reason, or it's possible something is happening on the Kibana side.

Like in the discuss post, let's run this query as a standing point:

POST .monitoring-es-*/_search
{
    "size": 1,
    "query": {
        "term": {
            "type": "node_stats"
        }
    },
    "sort": {
        "timestamp": {
            "order": "desc"
        }
    },
    "collapse": {
        "field": "source_node.uuid"
    },
    "aggs": {
    	"nodes": {
    		"terms": {
    			"field": "source_node.uuid"
    		}
    	}
    }
}

@coudenysj
Copy link
Author

Thanks @chrisronline for your reply. I changed the query a bit, because we have 15 nodes:

POST .monitoring-es-*/_search
{
    "size": 0,
    "query": {
        "term": {
            "type": "node_stats"
        }
    },
    "sort": {
        "timestamp": {
            "order": "desc"
        }
    },
    "collapse": {
        "field": "source_node.uuid"
    },
    "aggs": {
    	"nodes": {
    		"terms": {
    			"field": "source_node.uuid",
    			"size": 100
    		}
    	}
    }
}

which results in 15 buckets:

{
  "took" : 690,
  "timed_out" : false,
  "_shards" : {
    "total" : 16,
    "successful" : 16,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 10000,
      "relation" : "gte"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "nodes" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "leNGcTahQDCQyebBqhTlKg",
          "doc_count" : 991460
        },
        {
          "key" : "g9UxGx0nQuWH67uxDH0cgg",
          "doc_count" : 991441
        },
        {
          "key" : "05RymCqYR4SFoOC_gbJKGw",
          "doc_count" : 991401
        },
        {
          "key" : "4hcycW5IRSGKcnoWjjl3Cw",
          "doc_count" : 62075
        },
        {
          "key" : "06Hd4yeeTyaJNAOvMdXm5g",
          "doc_count" : 62074
        },
        {
          "key" : "RbDw6byrSqSovtD5znYR-g",
          "doc_count" : 62073
        },
        {
          "key" : "yRzHzVH5QMqVppmnvJudwA",
          "doc_count" : 62073
        },
        {
          "key" : "Y3ngJrxZQpyI682_RJv84g",
          "doc_count" : 62072
        },
        {
          "key" : "ZvTXAlcmT9qF3187ecR1cQ",
          "doc_count" : 62071
        },
        {
          "key" : "dtsJ-Hu3SQCzqC45Q9zGYQ",
          "doc_count" : 62071
        },
        {
          "key" : "ibJ_ghEpTr-PmT-cgzJEmg",
          "doc_count" : 62071
        },
        {
          "key" : "up4KRBnrRsmKt4FSB-9UzA",
          "doc_count" : 62071
        },
        {
          "key" : "yvvV2d_eSzaxcerpqGeEkw",
          "doc_count" : 62071
        },
        {
          "key" : "-nx3Cg2TTMyVlJG2KRRIGQ",
          "doc_count" : 62070
        },
        {
          "key" : "jIppmXl-RDe5T8NF24arYQ",
          "doc_count" : 62070
        }
      ]
    }
  }
}

@chrisronline
Copy link
Contributor

@coudenysj

Thanks for that!

That seems fine, so let's try something else.

Can you try adjusting the time period of the date picker to various larger and smaller periods of time? See if there is any period of time where you do actually see all the nodes listed.

Thanks

@coudenysj
Copy link
Author

Now this is strange. When I check for the last 4 hours, I only get the masters. When I check the last 5 hours, I get all my nodes...

@chrisronline
Copy link
Contributor

Try running this query and adjusting the time periods as you did in the UI. I'm wondering if you're hitting a limit in regards to search.max_buckets. Let me know if you see any errors/warning with the following query in various time ranges:

POST .monitoring-es-*/_search
{
  "size": 0,
  "query": {
    "bool": {
      "filter": [{
        "term": {
          "type": "node_stats"
        }
      },{
        "range": {
          "timestamp": {
            "gte":"now-1m"
          }
        }
      }]
    }
  },
  "collapse": {
    "field": "source_node.uuid"
  },
  "aggs": {
    "nodes": {
      "terms": {
        "field": "source_node.uuid",
        "size": 10000
      },
      "aggs": {
        "node_cgroup_quota": {
          "date_histogram": {
            "field": "timestamp",
            "min_doc_count": 1,
            "fixed_interval": "30s"
          },
          "aggs": {
            "usage": {
              "max": {
                "field": "node_stats.os.cgroup.cpuacct.usage_nanos"
              }
            },
            "periods": {
              "max": {
                "field": "node_stats.os.cgroup.cpu.stat.number_of_elapsed_periods"
              }
            },
            "quota": {
              "min": {
                "field": "node_stats.os.cgroup.cpu.cfs_quota_micros"
              }
            },
            "usage_deriv": {
              "derivative": {
                "buckets_path": "usage",
                "gap_policy": "skip",
                "unit": "1s"
              }
            },
            "periods_deriv": {
              "derivative": {
                "buckets_path": "periods",
                "gap_policy": "skip",
                "unit": "1s"
              }
            }
          }
        },
        "node_cgroup_throttled": {
          "date_histogram": {
            "field": "timestamp",
            "min_doc_count": 1,
            "fixed_interval": "30s"
          },
          "aggs": {
            "metric": {
              "max": {
                "field": "node_stats.os.cgroup.cpu.stat.time_throttled_nanos"
              }
            },
            "metric_deriv": {
              "derivative": {
                "buckets_path": "metric",
                "unit": "1s"
              }
            }
          }
        },
        "node_cpu_utilization": {
          "date_histogram": {
            "field": "timestamp",
            "min_doc_count": 1,
            "fixed_interval": "30s"
          },
          "aggs": {
            "metric": {
              "max": {
                "field": "node_stats.process.cpu.percent"
              }
            },
            "metric_deriv": {
              "derivative": {
                "buckets_path": "metric",
                "unit": "1s"
              }
            }
          }
        },
        "node_load_average": {
          "date_histogram": {
            "field": "timestamp",
            "min_doc_count": 1,
            "fixed_interval": "30s"
          },
          "aggs": {
            "metric": {
              "max": {
                "field": "node_stats.os.cpu.load_average.1m"
              }
            },
            "metric_deriv": {
              "derivative": {
                "buckets_path": "metric",
                "unit": "1s"
              }
            }
          }
        },
        "node_jvm_mem_percent": {
          "date_histogram": {
            "field": "timestamp",
            "min_doc_count": 1,
            "fixed_interval": "30s"
          },
          "aggs": {
            "metric": {
              "max": {
                "field": "node_stats.jvm.mem.heap_used_percent"
              }
            },
            "metric_deriv": {
              "derivative": {
                "buckets_path": "metric",
                "unit": "1s"
              }
            }
          }
        },
        "node_free_space": {
          "date_histogram": {
            "field": "timestamp",
            "min_doc_count": 1,
            "fixed_interval": "30s"
          },
          "aggs": {
            "metric": {
              "max": {
                "field": "node_stats.fs.total.available_in_bytes"
              }
            },
            "metric_deriv": {
              "derivative": {
                "buckets_path": "metric",
                "unit": "1s"
              }
            }
          }
        }
      }
    }
  },
  "sort": [{
    "timestamp": {
      "order": "desc"
    }
  }]
}

@chrisronline
Copy link
Contributor

Also,

Now this is strange. When I check for the last 4 hours, I only get the masters. When I check the last 5 hours, I get all my nodes...

Do you ever see all nodes when doing a time period less than 4 hours?

@coudenysj
Copy link
Author

Do you ever see all nodes when doing a time period less than 4 hours?

I don't think so.

@coudenysj
Copy link
Author

Try running this query and adjusting the time periods as you did in the UI. I'm wondering if you're hitting a limit in regards to search.max_buckets. Let me know if you see any errors/warning with the following query in various time ranges:

When I change the range to:

"timestamp": {
    "gte":"now-1h"
}

I get errors

"failures" : [
      {
        "shard" : 0,
        "index" : ".monitoring-es-7-2019.09.05",
        "node" : "RbDw6byrSqSovtD5znYR-g",
        "reason" : {
          "type" : "too_many_buckets_exception",
          "reason" : "Trying to create too many buckets. Must be less than or equal to: [10000] but was [10005]. This limit can be set by changing the [search.max_buckets] cluster level setting.",
          "max_buckets" : 10000
        }
      }
    ]

@chrisronline
Copy link
Contributor

Try upping the search.max_buckets cluster setting, and then trying in the UI again.

This issue is related and a fix will be available soon

@chrisronline
Copy link
Contributor

I'm going to close this out, assuming things are better now. Feel free to reopen if that's not the case.

@coudenysj
Copy link
Author

We've updated to 7.6 and looks fine now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants