Monitoring tab in Kibana only showing master nodes #44666

coudenysj · 2019-09-03T14:50:01Z

Kibana version: 7.3.1

Elasticsearch version: 7.3.1

Server OS version: Debian 10

Original install method (e.g. download page, yum, from source, etc.): deb

Describe the bug:

The monitoring tab is only showing the master node details (but indicates there are more nodes in the cluster).

Steps to reproduce:

Open https://master:5601/app/monitoring#/elasticsearch/nodes

Expected behavior:

We have another cluster running 6.5.2 which is showing all nodes in the cluster.

Screenshots (if relevant):

Errors in browser console (if relevant):

Refused to execute inline script because it violates the following Content Security Policy directive: "script-src 'unsafe-eval' 'nonce-xxx/v'". Either the 'unsafe-inline' keyword, a hash ('sha256-xxx), or a nonce ('nonce-...') is required to enable inline execution.

Provide logs and/or server output (if relevant):

Any additional context:

It looks exactly like https://discuss.elastic.co/t/monitoring-elasticsearch-nodes-no-data-nodes/167321, but all the queries requested there are showing all nodes.

The text was updated successfully, but these errors were encountered:

elasticmachine · 2019-09-03T22:07:41Z

Pinging @elastic/stack-monitoring

chrisronline · 2019-09-04T13:49:23Z

Hey @coudenysj,

I think the first step is determining if the data is there or not. It's possible that the non master nodes aren't reporting for some reason, or it's possible something is happening on the Kibana side.

Like in the discuss post, let's run this query as a standing point:

POST .monitoring-es-*/_search
{
    "size": 1,
    "query": {
        "term": {
            "type": "node_stats"
        }
    },
    "sort": {
        "timestamp": {
            "order": "desc"
        }
    },
    "collapse": {
        "field": "source_node.uuid"
    },
    "aggs": {
    	"nodes": {
    		"terms": {
    			"field": "source_node.uuid"
    		}
    	}
    }
}

coudenysj · 2019-09-04T14:53:52Z

Thanks @chrisronline for your reply. I changed the query a bit, because we have 15 nodes:

POST .monitoring-es-*/_search
{
    "size": 0,
    "query": {
        "term": {
            "type": "node_stats"
        }
    },
    "sort": {
        "timestamp": {
            "order": "desc"
        }
    },
    "collapse": {
        "field": "source_node.uuid"
    },
    "aggs": {
    	"nodes": {
    		"terms": {
    			"field": "source_node.uuid",
    			"size": 100
    		}
    	}
    }
}

which results in 15 buckets:

{
  "took" : 690,
  "timed_out" : false,
  "_shards" : {
    "total" : 16,
    "successful" : 16,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 10000,
      "relation" : "gte"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "nodes" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "leNGcTahQDCQyebBqhTlKg",
          "doc_count" : 991460
        },
        {
          "key" : "g9UxGx0nQuWH67uxDH0cgg",
          "doc_count" : 991441
        },
        {
          "key" : "05RymCqYR4SFoOC_gbJKGw",
          "doc_count" : 991401
        },
        {
          "key" : "4hcycW5IRSGKcnoWjjl3Cw",
          "doc_count" : 62075
        },
        {
          "key" : "06Hd4yeeTyaJNAOvMdXm5g",
          "doc_count" : 62074
        },
        {
          "key" : "RbDw6byrSqSovtD5znYR-g",
          "doc_count" : 62073
        },
        {
          "key" : "yRzHzVH5QMqVppmnvJudwA",
          "doc_count" : 62073
        },
        {
          "key" : "Y3ngJrxZQpyI682_RJv84g",
          "doc_count" : 62072
        },
        {
          "key" : "ZvTXAlcmT9qF3187ecR1cQ",
          "doc_count" : 62071
        },
        {
          "key" : "dtsJ-Hu3SQCzqC45Q9zGYQ",
          "doc_count" : 62071
        },
        {
          "key" : "ibJ_ghEpTr-PmT-cgzJEmg",
          "doc_count" : 62071
        },
        {
          "key" : "up4KRBnrRsmKt4FSB-9UzA",
          "doc_count" : 62071
        },
        {
          "key" : "yvvV2d_eSzaxcerpqGeEkw",
          "doc_count" : 62071
        },
        {
          "key" : "-nx3Cg2TTMyVlJG2KRRIGQ",
          "doc_count" : 62070
        },
        {
          "key" : "jIppmXl-RDe5T8NF24arYQ",
          "doc_count" : 62070
        }
      ]
    }
  }
}

chrisronline · 2019-09-04T14:58:39Z

@coudenysj

Thanks for that!

That seems fine, so let's try something else.

Can you try adjusting the time period of the date picker to various larger and smaller periods of time? See if there is any period of time where you do actually see all the nodes listed.

Thanks

coudenysj · 2019-09-04T15:10:19Z

Now this is strange. When I check for the last 4 hours, I only get the masters. When I check the last 5 hours, I get all my nodes...

chrisronline · 2019-09-04T15:13:39Z

Try running this query and adjusting the time periods as you did in the UI. I'm wondering if you're hitting a limit in regards to search.max_buckets. Let me know if you see any errors/warning with the following query in various time ranges:

POST .monitoring-es-*/_search
{
  "size": 0,
  "query": {
    "bool": {
      "filter": [{
        "term": {
          "type": "node_stats"
        }
      },{
        "range": {
          "timestamp": {
            "gte":"now-1m"
          }
        }
      }]
    }
  },
  "collapse": {
    "field": "source_node.uuid"
  },
  "aggs": {
    "nodes": {
      "terms": {
        "field": "source_node.uuid",
        "size": 10000
      },
      "aggs": {
        "node_cgroup_quota": {
          "date_histogram": {
            "field": "timestamp",
            "min_doc_count": 1,
            "fixed_interval": "30s"
          },
          "aggs": {
            "usage": {
              "max": {
                "field": "node_stats.os.cgroup.cpuacct.usage_nanos"
              }
            },
            "periods": {
              "max": {
                "field": "node_stats.os.cgroup.cpu.stat.number_of_elapsed_periods"
              }
            },
            "quota": {
              "min": {
                "field": "node_stats.os.cgroup.cpu.cfs_quota_micros"
              }
            },
            "usage_deriv": {
              "derivative": {
                "buckets_path": "usage",
                "gap_policy": "skip",
                "unit": "1s"
              }
            },
            "periods_deriv": {
              "derivative": {
                "buckets_path": "periods",
                "gap_policy": "skip",
                "unit": "1s"
              }
            }
          }
        },
        "node_cgroup_throttled": {
          "date_histogram": {
            "field": "timestamp",
            "min_doc_count": 1,
            "fixed_interval": "30s"
          },
          "aggs": {
            "metric": {
              "max": {
                "field": "node_stats.os.cgroup.cpu.stat.time_throttled_nanos"
              }
            },
            "metric_deriv": {
              "derivative": {
                "buckets_path": "metric",
                "unit": "1s"
              }
            }
          }
        },
        "node_cpu_utilization": {
          "date_histogram": {
            "field": "timestamp",
            "min_doc_count": 1,
            "fixed_interval": "30s"
          },
          "aggs": {
            "metric": {
              "max": {
                "field": "node_stats.process.cpu.percent"
              }
            },
            "metric_deriv": {
              "derivative": {
                "buckets_path": "metric",
                "unit": "1s"
              }
            }
          }
        },
        "node_load_average": {
          "date_histogram": {
            "field": "timestamp",
            "min_doc_count": 1,
            "fixed_interval": "30s"
          },
          "aggs": {
            "metric": {
              "max": {
                "field": "node_stats.os.cpu.load_average.1m"
              }
            },
            "metric_deriv": {
              "derivative": {
                "buckets_path": "metric",
                "unit": "1s"
              }
            }
          }
        },
        "node_jvm_mem_percent": {
          "date_histogram": {
            "field": "timestamp",
            "min_doc_count": 1,
            "fixed_interval": "30s"
          },
          "aggs": {
            "metric": {
              "max": {
                "field": "node_stats.jvm.mem.heap_used_percent"
              }
            },
            "metric_deriv": {
              "derivative": {
                "buckets_path": "metric",
                "unit": "1s"
              }
            }
          }
        },
        "node_free_space": {
          "date_histogram": {
            "field": "timestamp",
            "min_doc_count": 1,
            "fixed_interval": "30s"
          },
          "aggs": {
            "metric": {
              "max": {
                "field": "node_stats.fs.total.available_in_bytes"
              }
            },
            "metric_deriv": {
              "derivative": {
                "buckets_path": "metric",
                "unit": "1s"
              }
            }
          }
        }
      }
    }
  },
  "sort": [{
    "timestamp": {
      "order": "desc"
    }
  }]
}

chrisronline · 2019-09-04T15:25:06Z

Also,

Now this is strange. When I check for the last 4 hours, I only get the masters. When I check the last 5 hours, I get all my nodes...

Do you ever see all nodes when doing a time period less than 4 hours?

coudenysj · 2019-09-05T07:26:48Z

Do you ever see all nodes when doing a time period less than 4 hours?

I don't think so.

coudenysj · 2019-09-05T07:28:10Z

Try running this query and adjusting the time periods as you did in the UI. I'm wondering if you're hitting a limit in regards to search.max_buckets. Let me know if you see any errors/warning with the following query in various time ranges:

When I change the range to:

"timestamp": {
    "gte":"now-1h"
}

I get errors

"failures" : [
      {
        "shard" : 0,
        "index" : ".monitoring-es-7-2019.09.05",
        "node" : "RbDw6byrSqSovtD5znYR-g",
        "reason" : {
          "type" : "too_many_buckets_exception",
          "reason" : "Trying to create too many buckets. Must be less than or equal to: [10000] but was [10005]. This limit can be set by changing the [search.max_buckets] cluster level setting.",
          "max_buckets" : 10000
        }
      }
    ]

chrisronline · 2019-09-05T14:18:23Z

Try upping the search.max_buckets cluster setting, and then trying in the UI again.

This issue is related and a fix will be available soon

chrisronline · 2020-03-10T20:16:30Z

I'm going to close this out, assuming things are better now. Feel free to reopen if that's not the case.

coudenysj · 2020-03-15T13:23:31Z

We've updated to 7.6 and looks fine now.

cjcenizal added Team:Monitoring Stack Monitoring team triage_needed labels Sep 3, 2019

chrisronline closed this as completed Mar 10, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Monitoring tab in Kibana only showing master nodes #44666

Monitoring tab in Kibana only showing master nodes #44666

coudenysj commented Sep 3, 2019

elasticmachine commented Sep 3, 2019

chrisronline commented Sep 4, 2019

coudenysj commented Sep 4, 2019

chrisronline commented Sep 4, 2019

coudenysj commented Sep 4, 2019

chrisronline commented Sep 4, 2019

chrisronline commented Sep 4, 2019

coudenysj commented Sep 5, 2019

coudenysj commented Sep 5, 2019

chrisronline commented Sep 5, 2019

chrisronline commented Mar 10, 2020

coudenysj commented Mar 15, 2020

Monitoring tab in Kibana only showing master nodes #44666

Monitoring tab in Kibana only showing master nodes #44666

Comments

coudenysj commented Sep 3, 2019

elasticmachine commented Sep 3, 2019

chrisronline commented Sep 4, 2019

coudenysj commented Sep 4, 2019

chrisronline commented Sep 4, 2019

coudenysj commented Sep 4, 2019

chrisronline commented Sep 4, 2019

chrisronline commented Sep 4, 2019

coudenysj commented Sep 5, 2019

coudenysj commented Sep 5, 2019

chrisronline commented Sep 5, 2019

chrisronline commented Mar 10, 2020

coudenysj commented Mar 15, 2020