Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Logstash pipelines with Elasticsearch output can vanish from Monitoring app #52245

Closed
jarpy opened this issue Dec 5, 2019 · 24 comments
Closed
Assignees
Labels
bug Fixes for quality problems that affect the customer experience Feature:Stack Monitoring Team:Monitoring Stack Monitoring team

Comments

@jarpy
Copy link
Contributor

jarpy commented Dec 5, 2019

Kibana version:
7.5.0

Elasticsearch version:
7.5.0

Server OS version:
Official Docker images

Browser version:
Chromium Version 78.0.3904.108 (Official Build) Arch Linux (64-bit)

Browser OS version:
Arch Linux

Original install method (e.g. download page, yum, from source, etc.):
Offical Docker images

Describe the bug:
When using external monitoring (via Metricbeat) to monitor Logstash pipelines, a pipeline may not appear in the Monitoring application when that pipeline has an Elasticsearch output declared.

Steps to reproduce:
A full, automated reproduction is available as a Docker Compose stack here:
https://github.com/elastic/logstash-external-monitoring-repro

  1. Clone the repository
  2. docker-compose up
  3. Browse http://localhost:5601 to access Kibana
  4. Enter username elastic, password nomnom

Expected behaviour:
Two pipelines are shown (and are, if you comment out the Elasticsearch output in the Logstash pipeline configuration).

Screenshots:

With ES output:
image

Without ES output:
image

@elasticmachine
Copy link
Contributor

Pinging @elastic/stack-monitoring (Team:Monitoring)

@jarpy
Copy link
Contributor Author

jarpy commented Dec 5, 2019

Cc @ycombinator

@cachedout
Copy link
Contributor

cachedout commented Dec 9, 2019

Here is a logstash_stats.pipelines data structure that I collected using this test setup:

Document containing bad

{
  "events": {
    "duration_in_millis": 15055,
    "filtered": 1681,
    "in": 1681,
    "out": 1681,
    "queue_push_duration_in_millis": 0
  },
  "reloads": {
    "successes": 0,
    "failures": 0
  },
  "queue": {
    "events_count": 0,
    "max_queue_size_in_bytes": 0,
    "queue_size_in_bytes": 0,
    "type": "memory"
  },
  "vertices": [
    {
      "pipeline_ephemeral_id": "b88f338d-5d4f-49e8-9669-0eeef6ac0582",
      "queue_push_duration_in_millis": 0,
      "events_out": 1681,
      "id": "97e1da0ed7dd9fa6cc42cb7233a37d52e8ef06f902386d0afcbbba82da424865"
    },
    {
      "duration_in_millis": 14290,
      "events_in": 1681,
      "events_out": 1681,
      "id": "24fde1c88ad65852eb9f02a5ec446963f5cd154bc8d0e3351e508f0824aaa963",
      "long_counters": [
        {
          "name": "documents.successes",
          "value": 1681
        },
        {
          "name": "bulk_requests.responses.200",
          "value": 1681
        },
        {
          "name": "bulk_requests.successes",
          "value": 1681
        }
      ],
      "pipeline_ephemeral_id": "b88f338d-5d4f-49e8-9669-0eeef6ac0582",
      "cluster_uuid": "YDgadTXCROyHgG5AjN53kw"
    },
    {
      "pipeline_ephemeral_id": "b88f338d-5d4f-49e8-9669-0eeef6ac0582",
      "duration_in_millis": 599,
      "events_in": 1681,
      "events_out": 1681,
      "id": "744b1645b125ace90abf25d61f3311d39feef1681934563435d9d11daad3317e"
    }
  ],
  "id": "bad",
  "hash": "5e1bb463f367596346c7693b8904c46291ca5f8ac91c4753287af3e3a0b2eea9",
  "ephemeral_id": "b88f338d-5d4f-49e8-9669-0eeef6ac0582"
}

Document containing good pipeline

{
  "reloads": {
    "successes": 0,
    "failures": 0
  },
  "queue": {
    "max_queue_size_in_bytes": 0,
    "queue_size_in_bytes": 0,
    "type": "memory",
    "events_count": 0
  },
  "vertices": [
    {
      "events_out": 1682,
      "id": "f8fb5773899170f71c8a072a29852aa27497546d8215c39eefe58a07e295488b",
      "pipeline_ephemeral_id": "1cca8861-0803-429d-aff0-fdc823fec51f",
      "queue_push_duration_in_millis": 0
    },
    {
      "events_in": 1682,
      "events_out": 1682,
      "id": "60eb1c3c276b412949aef3d652283e29b54eaa5472535e3812ec417d4a0dc3bd",
      "pipeline_ephemeral_id": "1cca8861-0803-429d-aff0-fdc823fec51f",
      "duration_in_millis": 1112
    }
  ],
  "id": "good",
  "hash": "4a5f7c06dfcbfb97e98a408161fac68e9749b19e72e51b74d4954e2656ecfd59",
  "ephemeral_id": "1cca8861-0803-429d-aff0-fdc823fec51f",
  "events": {
    "queue_push_duration_in_millis": 0,
    "duration_in_millis": 188,
    "filtered": 1682,
    "in": 1682,
    "out": 1682
  }
}

@cachedout
Copy link
Contributor

The problem may be that when we have an Elasticsearch plugin that we introduce a cluster_uuid field in the top-level in the document. Here are the two documents showing one with that field and the other without:

Good pipeline document

{
  "_index": ".monitoring-logstash-7-mb-2019.12.09",
  "_type": "_doc",
  "_id": "fNCu6m4BM67Aqg9wiqf6",
  "_version": 1,
  "_score": null,
  "_source": {
    "@timestamp": "2019-12-09T12:42:16.712Z",
    "agent": {
      "ephemeral_id": "0ae43262-2973-497e-8f0e-a2651b0183a2",
      "hostname": "54f00d762755",
      "id": "6aecd4bd-5a76-4dea-b426-9153d8fba97a",
      "version": "7.5.0",
      "type": "metricbeat"
    },
    "ecs": {
      "version": "1.1.0"
    },
    "timestamp": "2019-12-09T12:42:16.720Z",
    "interval_ms": 10000,
    "type": "logstash_stats",
    "event": {
      "dataset": "logstash.node.stats",
      "module": "logstash",
      "duration": 7875379
    },
    "metricset": {
      "name": "node_stats",
      "period": 10000
    },
    "host": {
      "name": "54f00d762755"
    },
    "logstash_stats": {
      "queue": {
        "events_count": 0
      },
      "process": {
        "open_file_descriptors": 92,
        "max_file_descriptors": 1048576,
        "cpu": {}
      },
      "os": {
        "cpu": {
          "load_average": {
            "15m": 0.36,
            "1m": 0.35,
            "5m": 0.3
          }
        },
        "cgroup": {
          "cpuacct": {
            "control_group": "/",
            "usage_nanos": 150018217068
          },
          "cpu": {
            "stat": {
              "number_of_elapsed_periods": 0,
              "number_of_times_throttled": 0,
              "time_throttled_nanos": 0
            },
            "control_group": "/"
          }
        }
      },
      "timestamp": "2019-12-09T12:42:16.720Z",
      "events": {
        "out": 2303,
        "duration_in_millis": 10908,
        "in": 2305,
        "filtered": 2303
      },
      "jvm": {
        "gc": {
          "collectors": {
            "young": {
              "collection_count": 23,
              "collection_time_in_millis": 1068
            },
            "old": {
              "collection_count": 2,
              "collection_time_in_millis": 286
            }
          }
        },
        "mem": {
          "heap_max_in_bytes": 5253365758,
          "heap_used_in_bytes": 149783952,
          "heap_used_percent": 2
        },
        "uptime_in_millis": 1178611
      },
      "reloads": {
        "failures": 0,
        "successes": 0
      },
      "pipelines": [
        {
          "hash": "4a5f7c06dfcbfb97e98a408161fac68e9749b19e72e51b74d4954e2656ecfd59",
          "ephemeral_id": "01ba784e-f4c5-40ec-a602-e3962a8c529e",
          "events": {
            "duration_in_millis": 210,
            "filtered": 1152,
            "in": 1153,
            "out": 1152,
            "queue_push_duration_in_millis": 0
          },
          "reloads": {
            "successes": 0,
            "failures": 0
          },
          "queue": {
            "max_queue_size_in_bytes": 0,
            "queue_size_in_bytes": 0,
            "type": "memory",
            "events_count": 0
          },
          "vertices": [
            {
              "events_out": 1153,
              "id": "f8fb5773899170f71c8a072a29852aa27497546d8215c39eefe58a07e295488b",
              "pipeline_ephemeral_id": "01ba784e-f4c5-40ec-a602-e3962a8c529e",
              "queue_push_duration_in_millis": 0
            },
            {
              "pipeline_ephemeral_id": "01ba784e-f4c5-40ec-a602-e3962a8c529e",
              "duration_in_millis": 705,
              "events_in": 1152,
              "events_out": 1152,
              "id": "60eb1c3c276b412949aef3d652283e29b54eaa5472535e3812ec417d4a0dc3bd"
            }
          ],
          "id": "good"
        }
      ],
      "logstash": {
        "status": "green",
        "pipeline": {
          "workers": 6,
          "batch_size": 125
        },
        "uuid": "105536f5-ec81-4862-9e16-ae7f7f1be973",
        "ephemeral_id": "4b3907f8-8595-466f-b0de-961550a1392a",
        "host": "bec471797c44",
        "http_address": "0.0.0.0:9600",
        "name": "bec471797c44",
        "version": "7.5.0",
        "snapshot": false
      }
    },
    "service": {
      "address": "logstash:9600",
      "type": "logstash"
    }
  },
  "fields": {
    "logstash_stats.timestamp": [
      "2019-12-09T12:42:16.720Z"
    ],
    "timestamp": [
      "2019-12-09T12:42:16.720Z"
    ]
  },
  "sort": [
    1575895336720
  ]
}

"Bad" pipeline document

{
  "_index": ".monitoring-logstash-7-mb-2019.12.09",
  "_type": "_doc",
  "_id": "fdCu6m4BM67Aqg9wiqf6",
  "_version": 1,
  "_score": null,
  "_source": {
    "@timestamp": "2019-12-09T12:42:16.712Z",
    "timestamp": "2019-12-09T12:42:16.720Z",
    "cluster_uuid": "YDgadTXCROyHgG5AjN53kw",
    "ecs": {
      "version": "1.1.0"
    },
    "agent": {
      "ephemeral_id": "0ae43262-2973-497e-8f0e-a2651b0183a2",
      "hostname": "54f00d762755",
      "id": "6aecd4bd-5a76-4dea-b426-9153d8fba97a",
      "version": "7.5.0",
      "type": "metricbeat"
    },
    "metricset": {
      "name": "node_stats",
      "period": 10000
    },
    "service": {
      "address": "logstash:9600",
      "type": "logstash"
    },
    "type": "logstash_stats",
    "logstash_stats": {
      "jvm": {
        "uptime_in_millis": 1178611,
        "gc": {
          "collectors": {
            "old": {
              "collection_count": 2,
              "collection_time_in_millis": 286
            },
            "young": {
              "collection_time_in_millis": 1068,
              "collection_count": 23
            }
          }
        },
        "mem": {
          "heap_max_in_bytes": 5253365758,
          "heap_used_in_bytes": 149783952,
          "heap_used_percent": 2
        }
      },
      "queue": {
        "events_count": 0
      },
      "logstash": {
        "uuid": "105536f5-ec81-4862-9e16-ae7f7f1be973",
        "ephemeral_id": "4b3907f8-8595-466f-b0de-961550a1392a",
        "host": "bec471797c44",
        "status": "green",
        "http_address": "0.0.0.0:9600",
        "name": "bec471797c44",
        "version": "7.5.0",
        "snapshot": false,
        "pipeline": {
          "batch_size": 125,
          "workers": 6
        }
      },
      "pipelines": [
        {
          "id": "bad",
          "hash": "5e1bb463f367596346c7693b8904c46291ca5f8ac91c4753287af3e3a0b2eea9",
          "ephemeral_id": "2d3923e7-2b44-43be-a313-883c5154346f",
          "events": {
            "duration_in_millis": 10698,
            "filtered": 1151,
            "in": 1152,
            "out": 1151,
            "queue_push_duration_in_millis": 0
          },
          "reloads": {
            "successes": 0,
            "failures": 0
          },
          "queue": {
            "queue_size_in_bytes": 0,
            "type": "memory",
            "events_count": 0,
            "max_queue_size_in_bytes": 0
          },
          "vertices": [
            {
              "id": "97e1da0ed7dd9fa6cc42cb7233a37d52e8ef06f902386d0afcbbba82da424865",
              "pipeline_ephemeral_id": "2d3923e7-2b44-43be-a313-883c5154346f",
              "queue_push_duration_in_millis": 0,
              "events_out": 1152
            },
            {
              "cluster_uuid": "YDgadTXCROyHgG5AjN53kw",
              "duration_in_millis": 10180,
              "events_in": 1151,
              "events_out": 1151,
              "id": "24fde1c88ad65852eb9f02a5ec446963f5cd154bc8d0e3351e508f0824aaa963",
              "long_counters": [
                {
                  "name": "bulk_requests.responses.200",
                  "value": 1150
                },
                {
                  "name": "bulk_requests.successes",
                  "value": 1150
                },
                {
                  "name": "documents.successes",
                  "value": 1151
                }
              ],
              "pipeline_ephemeral_id": "2d3923e7-2b44-43be-a313-883c5154346f"
            },
            {
              "duration_in_millis": 564,
              "events_in": 1151,
              "events_out": 1151,
              "id": "744b1645b125ace90abf25d61f3311d39feef1681934563435d9d11daad3317e",
              "pipeline_ephemeral_id": "2d3923e7-2b44-43be-a313-883c5154346f"
            }
          ]
        }
      ],
      "timestamp": "2019-12-09T12:42:16.720Z",
      "events": {
        "in": 2305,
        "filtered": 2303,
        "out": 2303,
        "duration_in_millis": 10908
      },
      "reloads": {
        "successes": 0,
        "failures": 0
      },
      "process": {
        "open_file_descriptors": 92,
        "max_file_descriptors": 1048576,
        "cpu": {}
      },
      "os": {
        "cpu": {
          "load_average": {
            "5m": 0.3,
            "15m": 0.36,
            "1m": 0.35
          }
        },
        "cgroup": {
          "cpuacct": {
            "control_group": "/",
            "usage_nanos": 150018217068
          },
          "cpu": {
            "stat": {
              "time_throttled_nanos": 0,
              "number_of_elapsed_periods": 0,
              "number_of_times_throttled": 0
            },
            "control_group": "/"
          }
        }
      }
    },
    "host": {
      "name": "54f00d762755"
    },
    "event": {
      "dataset": "logstash.node.stats",
      "module": "logstash",
      "duration": 7884075
    },
    "interval_ms": 10000
  },
  "fields": {
    "logstash_stats.timestamp": [
      "2019-12-09T12:42:16.720Z"
    ],
    "timestamp": [
      "2019-12-09T12:42:16.720Z"
    ]
  },
  "sort": [
    1575895336720
  ]
}

@cachedout
Copy link
Contributor

I tried a few times to set cluster_uuid in the metricbeat config as follows but could not observe any change in behavior:

13:57 $ cat metricbeat/metricbeat.yml
#setup.template.settings:
#setup.kibana:

metricbeat.modules:
  - module: system
    enabled: false

  - module: logstash
    enabled: true
    metricsets:
      - node
      - node_stats
    period: 10s
    hosts: ["logstash:9600"]
    xpack.enabled: true
    cluster_uuid: "YDgadTXCROyHgG5AjN53kw"

output.elasticsearch:
  enabled: true
  hosts: ["elasticsearch-monitoring:9200"]
  username: elastic
  password: nomnom

After setting the cluster_uuid field, I issued a docker-compose restart and it looks like the cluster restarted correctly to me. I also tried removing the monitoring indices but didn't see any difference.

@cachedout
Copy link
Contributor

@ycombinator I wonder if we should look at the behavior here? Should we prefer cluster_uuid if set in the config over what we find in the pipelines?

@ycombinator
Copy link
Contributor

@cachedout I'm a bit confused. I don't recall us ever implementing a cluster_uuid setting for the Logstash Metricbeat module configuration. But maybe I'm forgetting something. Could you point me to something (doc, code, PR, ...) that suggests that the Logstash Metricbeat module allows a cluster_uuid setting like you tried to use here?

@cachedout
Copy link
Contributor

@ycombinator Ah, you know what, this is actually my mistake. I had thought that this change extended to Metricbeat modules as well, and specifically to Logstash. (I could have sworn that Logstash was involved in that conversation but I can't find any evidence of that now.)

Anyhow, go ahead and disregard my comment.

@ycombinator ycombinator self-assigned this Dec 12, 2019
@ycombinator
Copy link
Contributor

@jarpy I might know what's going on here.

When you run the good and bad pipelines without the ES output in either of them, and use Metricbeat to monitor them, what cluster do those two pipelines show up under in the Stack Monitoring UI? You can see the cluster ID at the very top of the Logstash Pipeline Listing page as part of the breadcrumb navigation. It will look something like Clusters / CLUSTER_ID_HERE / Logstash.

@jarpy
Copy link
Contributor Author

jarpy commented Dec 12, 2019

Cool, thanks for looking.

With the ES output disabled, both pipelines are listed under Standalone Cluster.

@jarpy
Copy link
Contributor Author

jarpy commented Dec 12, 2019

With the output enabled, only one cluster (the Standalone Cluster, with one pipeline) is displayed.

@ycombinator
Copy link
Contributor

Thanks. Now when you enable the ES output in the bad pipeline, click on Clusters in the breadcrumb navigation. That will take you to the Cluster Listing page. Do you see two clusters on that page? If so, click the cluster that's not `Standalone Cluster. Do you then see a Logstash section on the next page? If so, click the Pipelines link in the Logstash section. Do you see your bad pipeline (and only your bad pipeline) in there?

@jarpy
Copy link
Contributor Author

jarpy commented Dec 12, 2019

Screenshot with output enabled:

EDIT: Replaced screenshot. Previous one was taken in Kibana "Setup Mode".

image

@ycombinator
Copy link
Contributor

Interesting. So if you click on Clusters at the top of that screenshot you're just brought back to the same page? You don't see the Cluster Listing page with a table showing 2 clusters in it?

@jarpy
Copy link
Contributor Author

jarpy commented Dec 12, 2019

Correct. Clicking Clusters does not change the rendering in any way.

@jarpy
Copy link
Contributor Author

jarpy commented Dec 12, 2019

Ah! The URI is:

http://localhost:5601/app/monitoring#/overview?_g=(cluster_uuid:__standalone_cluster__,inSetupMode:!f)

Does that imply a filter for the Standalone Cluster?

@jarpy
Copy link
Contributor Author

jarpy commented Dec 12, 2019

I should add that in my actual use-case, I ship data to multiple ES clusters and have multiple pipelines, though only one pipeline contains ES outputs. All monitoring happens on a dedicated monitoring cluster.

@ycombinator
Copy link
Contributor

Alright, I think I know why we're not seeing the Cluster Listing page when you click Clusters at the top. I just noticed that in your Metricbeat configuration you are monitoring Logstash, but not Elasticsearch. So what's going on is this:

  1. When Metricbeat monitors the "good" pipeline (the one without the ES output), it ships it's monitoring data to .monitoring-logstash-* indices. The documents for this data do not contain a cluster_uuid field in them. This field is meant to represent the ID of an Elasticsearch cluster. But since there isn't one in the picture (no ES output in the pipeline), the field is absent.

  2. On the other hand, when Metricbeat monitors the "bad" pipeline (the one with the ES output in it), it indexes it's monitoring documents — with a cluster_uuid field in them — into .monitoring-logstash-* indices. It gets the value for this field from the cluster ID of the Elasticsearch cluster that the Logstash Elasticsearch output in the pipeline is connecting to.

  3. Now any monitoring data that doesn't have a cluster_uuid field in it is shown under the Standalone Cluster in the UI. This is why, when you're monitoring pipelines without an Elasticsearch output, you're seeing both of them under this cluster in the UI.

  4. On the other hand, any monitoring data that does have a cluster_uuid field in it, is shown under a cluster corresponding to that cluster_uuid provided we also have monitoring data for that Elasticsearch cluster as well. In other words, Metricbeat needs to be monitoring the same Elasticsearch cluster as the one in your Logstash pipeline output, otherwise the monitoring data for that Logstash pipeline is "hidden" in the UI (or "disappears" from the UI).

Obviously, this setup requirement is... not obvious 😄. Before we get to talking about how to solve this issue, could you confirm my theory above? To do this, could you have Metricbeat monitor the same Elasticsearch cluster as the one in your "bad" Logstash pipeline output? Then let me know if you see two clusters (Standalone and another one) when you click on Clusters in the UI breadcrumb nav.

jarpy added a commit to elastic/logstash-external-monitoring-repro that referenced this issue Dec 13, 2019
@jarpy
Copy link
Contributor Author

jarpy commented Dec 13, 2019

Very interesting! I added Elasticsearch monitoring (see above patch). The Monitoring interface now displays two clusters, both of which contain the Logstash instance (one-to-many representation of the Logstash instance):

image

The Standalone Cluster contains the good pipeline (no ES output):

image

The production-cluster contains the bad pipeline (ES output):

image

This one-to-many appearance of the Logstash instance is really interesting. I can also imagine it getting quickly out of hand in my production application, which uses a single LS cluster to route to an ever-growing set of Elasticsearch clusters. My intent is to use Monitoring to monitor Logstash, and only Logstash.

@ycombinator
Copy link
Contributor

This one-to-many appearance of the Logstash instance is really interesting. I can also imagine it getting quickly out of hand in my production application, which uses a single LS cluster to route to an ever-growing set of Elasticsearch clusters.

Right, so what you should see (with Elasticsearch monitoring also enabled for each of those clusters), is that the Logstash pipeline appears under the cluster to which it is sending data. This means, if a single Logstash pipeline sends data to multiple ES clusters, that Logstash pipeline will appear under each of those clusters in the Stack Monitoring UI.

My intent is to use Monitoring to monitor Logstash, and only Logstash.

Traditionally, i.e. with the "internal collection" monitoring approach, this has not been a supported use case. Because of the way the Stack Monitoring UI is organized (where data is grouped by a Production ES cluster), it also has been a requirement that that Production ES cluster be monitored as well. With internal collection this hasn't been much of an issue because the monitoring data is shipped from Logstash => Production ES cluster => Monitoring ES cluster. In such a setup, when monitoring collection is enabled on the Production ES cluster, it enables the passing through of the Logstash monitoring data but also enables the collection of the Production ES cluster's monitoring data as well.

In the new, "metricbeat collection" approach, we introduced the concept of a "Standalone Cluster". The idea behind this was to group instances of Beats or Logstash that don't have an Elasticsearch cluster anywhere in the picture (e.g. a Beat that's not using output.elasticsearch or a Logstash pipeline that's not using an elasticsearch output plugin). However, we didn't account for your use case: a Logstash pipeline that is using an elasticsearch output plugin but there's no desire to also monitor the corresponding Elasticsearch cluster.

Thanks for bringing up this use case. It's up to @elastic/stack-monitoring to decide how we want to handle it but here are my suggestions:

  1. For the short term, we implement Add ability to override cluster_uuid to be used in monitoring data logstash#11066. This will allow users to essentially override the cluster ID at a Logstash node level. This means, any pipelines being run by that node, regardless of whether they use elasticsearch outputs or not, will always present the overridden cluster ID in Logstash monitoring data. For your use case, you'd set the value of the override cluster ID to "", and that will have the effect of showing your Logstash pipelines under Standalone Cluster in the UI.

  2. In the longer term, the Stack Monitoring UI might want to look into moving away from grouping data around an ES cluster. There's good historical reasons for doing it this way, but with the introduction of Logstash and Beats monitoring (i.e. monitoring of products that may never be connected to an ES production cluster), the UI might need to become more flexible in how it organizes it's monitoring data.

@chrisronline
Copy link
Contributor

I think we should definitely go ahead and do 1 from above, as it will give a consistent experience between logstash and beats.

For 2, I agree it's a direction we want to take. I don't know if we have any tickets around this, but maybe @cachedout knows

@jarpy
Copy link
Contributor Author

jarpy commented Dec 14, 2019

Traditionally, i.e. with the "internal collection" monitoring approach, this has not been a supported use case.

I'm using internal monitoring in production right now, and I get an outcome that suits my use case. I figured I was only monitoring Logstash, but...

it also has been a requirement that that Production ES cluster be monitored as well

I realise now that I have been "gaming the system", and the cluster that I think of as the monitoring cluster is self-monitoring, and is thus also the "production cluster" required to make the interface play along. I probably clicked the one-click setup at some point (which works amazingly well). It's just a matter of perspective. Some people have a production cluster that is also a [self-]monitoring cluster. I have a monitoring cluster that is doing enough to qualify as "production".

In practice, my goal is just to have a good monitoring view of my Logstash cluster, but I don't mind having the self-monitored Elasticsearch cluster show up, naturally. My production application is essentially "Logstash as a Service", and the downstream Elasticsearch clusters (and other components) are not my concern. Currently, the result is fine when using internal monitoring, but a bit strange when using Metricbeat.

@jarpy
Copy link
Contributor Author

jarpy commented Dec 14, 2019

Oh, I should just mention as a curiosity:

With internal collection this hasn't been much of an issue because the monitoring data is shipped from Logstash => Production ES cluster => Monitoring ES cluster

I actually don't do this in my (working) production environment.

I do:

Logstash ==traffic=====> Elasticsearch[a]
         ==traffic=====> Elasticsearch[b]
         ==monitoring==> Elasticsearch[c] ==monitoring==v
                                  ^=====================<

@simianhacker simianhacker added the bug Fixes for quality problems that affect the customer experience label Jun 3, 2021
@simianhacker
Copy link
Member

I'm going to close this issue because it seems like elastic/logstash#11066 fixed the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Fixes for quality problems that affect the customer experience Feature:Stack Monitoring Team:Monitoring Stack Monitoring team
Projects
None yet
Development

No branches or pull requests

6 participants