Monitoring data is not shown for the last 15 mins in the cloud cluster. #123880

rashmivkulkarni · 2022-01-26T22:15:25Z

Kibana version: 7.17.0 ( latest) deployed on prod - cloud

Describe the bug: Deployed a 7.17.0 cluster on cloud -prod.

Steps to reproduce:

Once deployed, navigate to Stack monitoring - you will see the following screen. Now navigate to Logs and Metrics by clicking on the link .

2 Enable ship to a deployment in the UI - look for your deployment and click on the deployment ( self monitoring) and save. Now click on Metrics link.

For the last 15 min time range, there seems to be no data collected. Cloud seems to stop ingesting the data. The error message says Monitoring request failed as seen in the screenshot below . For 24 hr time range, data shows up. Cluster has been up for a while.

Network tab seemed o.k.

cc @jasonrhodes

The text was updated successfully, but these errors were encountered:

elasticmachine · 2022-01-26T22:16:09Z

Pinging @elastic/infra-monitoring-ui (Team:Monitoring)

jasonrhodes · 2022-01-26T22:51:25Z

I was able to reproduce this in cloud on 7.17.0.

It appears, to me, that monitoring data stops being collected fairly soon after it's been enabled, so you have to widen your search to cover the time you enabled monitoring for it to find any data. I don't know how to inspect Stack Monitoring collection logs in Cloud, so I am not sure what the next step is here.

I think the toast that pops up appears when clicking on the "Metrics" link from the cloud console and it happens because the cluster_uuid is included in the request. When you refresh, it makes a request without that cluster_uuid, which is asking for "all current clusters", which won't return a 404 but just a 200 with an empty [] of no clusters found.

I created a .monitoring* index pattern and used that in Discover to confirm that there were, in fact, no documents indexed in the time period, which is why I think ingest is failing rather than this being a query/UI issue.

rashmivkulkarni · 2022-01-26T23:48:31Z

@jasonrhodes can you please post the HAR file and your cluster id for debugging further ?

matschaffer · 2022-01-26T23:51:02Z

I started https://admin.found.no/deployments/567b27705936478680f3495a3d5037e5 as a test. Will see if that one stops collecting monitoring.

matschaffer · 2022-01-27T00:04:27Z

so far so good. Will keep an eye on it to see if it falls over.

matschaffer · 2022-01-27T01:13:50Z

So I took a look at @jasonrhodes 's cluster (https://admin.found.no/deployments/1df5d5b91b5a49fd9f9fa6fc2e93b51a)

It looks like metricbeat is throwing this error:

Cannot index event publisher.Event{Content:beat.Event{Timestamp:time.Date(2022, time.January, 27, 0, 22, 0, 50458769, time.Local), Meta:{"index":".monitoring-es-7-mb"}, Fields:{"agent":{"ephemeral_id":"9f635268-bace-41b6-a2bd-6f3eab0f19ab","hostname":"5d886d0ad862","id":"649e5cfd-66dc-4ab0-88da-12a66a65f81e","name":"5d886d0ad862","type":"metricbeat","version":"7.17.0"},"cluster_uuid":"1G-2qN3BS6-UcYX8I9DHpw","ecs":{"version":"1.12.0"},"enrich_coordinator_stats":{"executed_searches_total":0,"node_id":"MIcZUsjeSqC90MU5trc2wQ","queue_size":0,"remote_requests_current":0,"remote_requests_total":0},"event":{"dataset":"elasticsearch.enrich","duration":6114284,"module":"elasticsearch"},"host":{"name":"5d886d0ad862"},"interval_ms":10000,"metricset":{"name":"enrich","period":10000},"service":{"address":"https://5d886d0ad862:18669","type":"elasticsearch"},"timestamp":"2022-01-27T00:22:00.056Z","type":"enrich_coordinator_stats"}, Private:interface {}(nil), TimeSeries:false}, Flags:0x0, Cache:publisher.EventCache{m:common.MapStr(nil)}} (status=404): {"type":"index_not_found_exception","reason":"no such index [.monitoring-es-7-mb-2022.01.27] and [require_alias] request flag is [true] and [.monitoring-es-7-mb-2022.01.27] is not an alias","index_uuid":"_na_","index":".monitoring-es-7-mb-2022.01.27"}, dropping event!

This is very similar to https://github.com/elastic/support-known-issues/issues/1030 but with the gotcha that .monitoring-es-7-mb-2022.01.27 is not expected to be an alias.

I found elastic/beats#29879 which was backported to 7.17. I've requested the metricbeat config to help with investigation.

matschaffer · 2022-01-27T01:27:04Z

Here's the configs for @jasonrhodes 's deployment's metricbeat (courtesy of @lucasmoore )

metricbeat.yml

output.elasticsearch:
  hosts: ["http://containerhost:9244"]
  username: "elastic-observability-agent"
  password: "redacted"
  headers:
    X-Found-Cluster: d733939974c6401d95c05b2696664ac1
    X-Elastic-App-Auth: redacted


  ssl.verification_mode: full



setup.template.overwrite: false

metricbeat.config.modules:
  enabled: true
  path: ${path.config}/modules.d/*.yml
  reload.enabled: true
  reload.period: 30s

logging.level: info
logging.to_files: true
logging.json: true
logging.files:
  path: /app/logs/beats
  name: metricbeat.log
  keepfiles: 3



queue.disk:
  max_size: "1GB"
  read_ahead: 512
  write_ahead: 2048

modules.d/elasticsearch.yml

- module: elasticsearch
  metricsets:
    - ccr
    - enrich
    - cluster_stats
    - index
    - index_recovery
    - index_summary
    - ml_job
    - node_stats
    - shard
  period: 10s
  hosts: ["https://${HOSTNAME:10.43.255.38}:18669"]

  username: "ec-local-beats-monitor"


  password: "redacted"

  headers:


  ssl.verification_mode: none
  xpack.enabled: true

matschaffer · 2022-01-27T01:28:17Z

require_alias doesn't appear anywhere in the config, so it would seem that 7.17 probably is defaulting to require_alias even though the .monitoring-7-* indices are not supposed to be aliases.

I'll get a beats bug open.

matschaffer · 2022-01-27T01:35:14Z

Closing in favor of elastic/beats#30044

rashmivkulkarni added bug Fixes for quality problems that affect the customer experience Feature:Stack Monitoring labels Jan 26, 2022

botelastic bot added the needs-team Issues missing a team label label Jan 26, 2022

rashmivkulkarni added the Team:Monitoring Stack Monitoring team label Jan 26, 2022

botelastic bot removed the needs-team Issues missing a team label label Jan 26, 2022

rashmivkulkarni removed the Team:Monitoring Stack Monitoring team label Jan 26, 2022

botelastic bot added the needs-team Issues missing a team label label Jan 26, 2022

rashmivkulkarni added the Team:Infra Monitoring UI - DEPRECATED DEPRECATED - Label for the Infra Monitoring UI team. Use Team:obs-ux-infra_services label Jan 26, 2022

botelastic bot removed the needs-team Issues missing a team label label Jan 26, 2022

matschaffer closed this as completed Jan 27, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Monitoring data is not shown for the last 15 mins in the cloud cluster. #123880

Monitoring data is not shown for the last 15 mins in the cloud cluster. #123880

rashmivkulkarni commented Jan 26, 2022

elasticmachine commented Jan 26, 2022

jasonrhodes commented Jan 26, 2022

rashmivkulkarni commented Jan 26, 2022

matschaffer commented Jan 26, 2022

matschaffer commented Jan 27, 2022

matschaffer commented Jan 27, 2022

matschaffer commented Jan 27, 2022

matschaffer commented Jan 27, 2022

matschaffer commented Jan 27, 2022

Monitoring data is not shown for the last 15 mins in the cloud cluster. #123880

Monitoring data is not shown for the last 15 mins in the cloud cluster. #123880

Comments

rashmivkulkarni commented Jan 26, 2022

elasticmachine commented Jan 26, 2022

jasonrhodes commented Jan 26, 2022

rashmivkulkarni commented Jan 26, 2022

matschaffer commented Jan 26, 2022

matschaffer commented Jan 27, 2022

matschaffer commented Jan 27, 2022

matschaffer commented Jan 27, 2022

matschaffer commented Jan 27, 2022

matschaffer commented Jan 27, 2022