Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Monitoring data is not shown for the last 15 mins in the cloud cluster. #123880

Closed
rashmivkulkarni opened this issue Jan 26, 2022 · 9 comments
Closed
Labels
bug Fixes for quality problems that affect the customer experience Feature:Stack Monitoring Team:Infra Monitoring UI - DEPRECATED DEPRECATED - Label for the Infra Monitoring UI team. Use Team:obs-ux-infra_services

Comments

@rashmivkulkarni
Copy link
Contributor

Kibana version: 7.17.0 ( latest) deployed on prod - cloud

Describe the bug: Deployed a 7.17.0 cluster on cloud -prod.

Steps to reproduce:

  1. Once deployed, navigate to Stack monitoring - you will see the following screen. Now navigate to Logs and Metrics by clicking on the link .

Screen Shot 2022-01-26 at 2 01 51 PM

2 Enable ship to a deployment in the UI - look for your deployment and click on the deployment ( self monitoring) and save. Now click on Metrics link.

For the last 15 min time range, there seems to be no data collected. Cloud seems to stop ingesting the data. The error message says Monitoring request failed as seen in the screenshot below . For 24 hr time range, data shows up. Cluster has been up for a while.

Screen Shot 2022-01-26 at 11 46 37 AM

Network tab seemed o.k.
network_tab

cc @jasonrhodes

@rashmivkulkarni rashmivkulkarni added bug Fixes for quality problems that affect the customer experience Feature:Stack Monitoring labels Jan 26, 2022
@botelastic botelastic bot added the needs-team Issues missing a team label label Jan 26, 2022
@rashmivkulkarni rashmivkulkarni added the Team:Monitoring Stack Monitoring team label Jan 26, 2022
@elasticmachine
Copy link
Contributor

Pinging @elastic/infra-monitoring-ui (Team:Monitoring)

@botelastic botelastic bot removed the needs-team Issues missing a team label label Jan 26, 2022
@rashmivkulkarni rashmivkulkarni removed the Team:Monitoring Stack Monitoring team label Jan 26, 2022
@botelastic botelastic bot added the needs-team Issues missing a team label label Jan 26, 2022
@rashmivkulkarni rashmivkulkarni added the Team:Infra Monitoring UI - DEPRECATED DEPRECATED - Label for the Infra Monitoring UI team. Use Team:obs-ux-infra_services label Jan 26, 2022
@botelastic botelastic bot removed the needs-team Issues missing a team label label Jan 26, 2022
@jasonrhodes
Copy link
Member

I was able to reproduce this in cloud on 7.17.0.

It appears, to me, that monitoring data stops being collected fairly soon after it's been enabled, so you have to widen your search to cover the time you enabled monitoring for it to find any data. I don't know how to inspect Stack Monitoring collection logs in Cloud, so I am not sure what the next step is here.

I think the toast that pops up appears when clicking on the "Metrics" link from the cloud console and it happens because the cluster_uuid is included in the request. When you refresh, it makes a request without that cluster_uuid, which is asking for "all current clusters", which won't return a 404 but just a 200 with an empty [] of no clusters found.

I created a .monitoring* index pattern and used that in Discover to confirm that there were, in fact, no documents indexed in the time period, which is why I think ingest is failing rather than this being a query/UI issue.

@rashmivkulkarni
Copy link
Contributor Author

@jasonrhodes can you please post the HAR file and your cluster id for debugging further ?

@matschaffer
Copy link
Contributor

I started https://admin.found.no/deployments/567b27705936478680f3495a3d5037e5 as a test. Will see if that one stops collecting monitoring.

@matschaffer
Copy link
Contributor

Screen Shot 2022-01-27 at 9 04 12

so far so good. Will keep an eye on it to see if it falls over.

@matschaffer
Copy link
Contributor

So I took a look at @jasonrhodes 's cluster (https://admin.found.no/deployments/1df5d5b91b5a49fd9f9fa6fc2e93b51a)

It looks like metricbeat is throwing this error:

Cannot index event publisher.Event{Content:beat.Event{Timestamp:time.Date(2022, time.January, 27, 0, 22, 0, 50458769, time.Local), Meta:{"index":".monitoring-es-7-mb"}, Fields:{"agent":{"ephemeral_id":"9f635268-bace-41b6-a2bd-6f3eab0f19ab","hostname":"5d886d0ad862","id":"649e5cfd-66dc-4ab0-88da-12a66a65f81e","name":"5d886d0ad862","type":"metricbeat","version":"7.17.0"},"cluster_uuid":"1G-2qN3BS6-UcYX8I9DHpw","ecs":{"version":"1.12.0"},"enrich_coordinator_stats":{"executed_searches_total":0,"node_id":"MIcZUsjeSqC90MU5trc2wQ","queue_size":0,"remote_requests_current":0,"remote_requests_total":0},"event":{"dataset":"elasticsearch.enrich","duration":6114284,"module":"elasticsearch"},"host":{"name":"5d886d0ad862"},"interval_ms":10000,"metricset":{"name":"enrich","period":10000},"service":{"address":"https://5d886d0ad862:18669","type":"elasticsearch"},"timestamp":"2022-01-27T00:22:00.056Z","type":"enrich_coordinator_stats"}, Private:interface {}(nil), TimeSeries:false}, Flags:0x0, Cache:publisher.EventCache{m:common.MapStr(nil)}} (status=404): {"type":"index_not_found_exception","reason":"no such index [.monitoring-es-7-mb-2022.01.27] and [require_alias] request flag is [true] and [.monitoring-es-7-mb-2022.01.27] is not an alias","index_uuid":"_na_","index":".monitoring-es-7-mb-2022.01.27"}, dropping event!

This is very similar to https://github.com/elastic/support-known-issues/issues/1030 but with the gotcha that .monitoring-es-7-mb-2022.01.27 is not expected to be an alias.

I found elastic/beats#29879 which was backported to 7.17. I've requested the metricbeat config to help with investigation.

@matschaffer
Copy link
Contributor

Here's the configs for @jasonrhodes 's deployment's metricbeat (courtesy of @lucasmoore )

metricbeat.yml
output.elasticsearch:
  hosts: ["http://containerhost:9244"]
  username: "elastic-observability-agent"
  password: "redacted"
  headers:
    X-Found-Cluster: d733939974c6401d95c05b2696664ac1
    X-Elastic-App-Auth: redacted


  ssl.verification_mode: full



setup.template.overwrite: false

metricbeat.config.modules:
  enabled: true
  path: ${path.config}/modules.d/*.yml
  reload.enabled: true
  reload.period: 30s

logging.level: info
logging.to_files: true
logging.json: true
logging.files:
  path: /app/logs/beats
  name: metricbeat.log
  keepfiles: 3



queue.disk:
  max_size: "1GB"
  read_ahead: 512
  write_ahead: 2048
modules.d/elasticsearch.yml
- module: elasticsearch
  metricsets:
    - ccr
    - enrich
    - cluster_stats
    - index
    - index_recovery
    - index_summary
    - ml_job
    - node_stats
    - shard
  period: 10s
  hosts: ["https://${HOSTNAME:10.43.255.38}:18669"]

  username: "ec-local-beats-monitor"


  password: "redacted"

  headers:


  ssl.verification_mode: none
  xpack.enabled: true

@matschaffer
Copy link
Contributor

require_alias doesn't appear anywhere in the config, so it would seem that 7.17 probably is defaulting to require_alias even though the .monitoring-7-* indices are not supposed to be aliases.

I'll get a beats bug open.

@matschaffer
Copy link
Contributor

Closing in favor of elastic/beats#30044

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Fixes for quality problems that affect the customer experience Feature:Stack Monitoring Team:Infra Monitoring UI - DEPRECATED DEPRECATED - Label for the Infra Monitoring UI team. Use Team:obs-ux-infra_services
Projects
None yet
Development

No branches or pull requests

4 participants