Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make elasticsearch/node_stats metricset work for Stack Monitoring without xpack.enabled flag #19747

Merged
merged 32 commits into from
Aug 14, 2020

Conversation

ycombinator
Copy link
Contributor

@ycombinator ycombinator commented Jul 8, 2020

What does this PR do

This PR prepares the elasticsearch/node_stats metricset (without setting xpack.enabled: true) to collect and index data for the Stack Monitoring UI. Concretely, it does the following:

  • introduces new fields, collecting data needed by the Stack Monitoring UI. These fields are named following Metricbeat/ECS naming conventions.
  • introduces field aliases from current fields known to the Stack Monitoring UI to the fields defined by the metricset. This will aid transitioning the Stack Monitoring UI in 7.x to be driven off the metricset with xpack.enabled omitted. In 8.0 the aliases will be removed and the Stack Monitoring UI code will have to be overhauled to reference the new fields directly.
  • removes the data_xpack.go file implementing the xpack.enabled: true code path for the node_stats metricset. Going forward, the node_stats metricset will simply ignore the value of the xpack.enabled setting.

Testing this PR

  1. Build Metricbeat with this PR checked out.

    cd $GOPATH/src/github.com/elastic/beats/metricbeat
    mage build
    
  2. Enable the elasticsearch-xpack module.

    metricbeat modules enable elasticsearch-xpack
    
  3. Optionally, disable the system module. This will reduce "noise" in metricbeat-* indices, should you needs to inspect them.

    metricbeat modules disable system
    
  4. Run Metricbeat (assumes output is being sent to Elasticsearch running at localhost:9200).

    metricbeat -e
    

Authors checklist

  • Define new fields in elasticsearch/node_stats/_meta/fields.yml
  • Enhance data.go to collect necessary additional metrics and index them into new fields.
  • Define field aliases from custom field names to Metricbeat/ECS-compliant field names in elasticsearch/_meta/fields.yml

@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Jul 8, 2020
@ycombinator ycombinator added the Team:Services (Deprecated) Label for the former Integrations-Services team label Jul 8, 2020
@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Jul 8, 2020
@elasticmachine
Copy link
Collaborator

elasticmachine commented Jul 8, 2020

💚 Build Succeeded

Pipeline View Test View Changes Artifacts preview

Expand to view the summary

Build stats

  • Build Cause: [Pull request #19747 event]

  • Start Time: 2020-08-14T02:34:12.799+0000

  • Duration: 77 min 39 sec

Test stats 🧪

Test Results
Failed 0
Passed 3695
Skipped 703
Total 4398

@ycombinator
Copy link
Contributor Author

Script used for automatically generating field aliases YAML, to be added to module/elasticsearch/_meta/fields.yml: https://gist.github.com/ycombinator/c48af6ea6f121851c30b21d3de98c6c0

@ycombinator ycombinator marked this pull request as ready for review July 16, 2020 23:32
@elasticmachine
Copy link
Collaborator

Pinging @elastic/integrations-services (Team:Services)

@ycombinator ycombinator requested a review from chrisronline July 16, 2020 23:32
@chrisronline
Copy link
Contributor

I tried to run this locally and I'm seeing this error in the metricbeat log:

2020-07-20T11:23:43.341-0400	INFO	module/wrapper.go:259	Error fetching data for metricset elasticsearch.node_stats: 1 error: failure to apply node schema: 3 errors: key `os.cgroup` not found; key `os.cpu.load_average.5m` not found; key `os.cpu.load_average.15m` not found
2020-07-20T11:23:53.343-0400	INFO	module/wrapper.go:259	Error fetching data for metricset elasticsearch.node_stats: 1 error: failure to apply node schema: 3 errors: key `os.cpu.load_average.5m` not found; key `os.cpu.load_average.15m` not found; key `os.cgroup` not found

@ycombinator
Copy link
Contributor Author

@chrisronline Thanks for reporting the errors. The latest two commits should take care of them.

@chrisronline
Copy link
Contributor

I'm running this successfully now (no errors in the metricbeat log) in a fresh environment and I'm not seeing node_stats documents.

POST .monitoring-es-*/_search
{
  "size": 0,
  "aggs": {
    "types": {
      "terms": {
        "field": "type",
        "size": 10
      }
    }
  }
}

->

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 376,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "types" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "index_stats",
          "doc_count" : 280
        },
        {
          "key" : "cluster_stats",
          "doc_count" : 20
        },
        {
          "key" : "enrich_coordinator_stats",
          "doc_count" : 20
        },
        {
          "key" : "index_recovery",
          "doc_count" : 20
        },
        {
          "key" : "indices_stats",
          "doc_count" : 20
        },
        {
          "key" : "shards",
          "doc_count" : 16
        }
      ]
    }
  }
}

elasticsearch-xpack module:

- module: elasticsearch
  metricsets:
    - ccr
    - cluster_stats
    - enrich
    - index
    - index_recovery
    - index_summary
    - ml_job
    - node_stats
    - shard

@ycombinator
Copy link
Contributor Author

@chrisronline Awesome! And, just to confirm, you're seeing the equivalent of node_stats documents in metricbeat-*, yes?

@chrisronline
Copy link
Contributor

Where did we land on the type field? I don't see an alias for it. Can we alias it to metricset.name?

@chrisronline
Copy link
Contributor

Also, for source_node, can we map the following two:

source_node.uuid -> elasticsearch.node.id
source_node.name -> elasticsearch.node.name

@ycombinator
Copy link
Contributor Author

Where did we land on the type field? I don't see an alias for it. Can we alias it to metricset.name?

I'm seeing a top-level type field already in the template generated by Metricbeat. So aliasing it to metricset.name might not be possible. I'm trying to figure out where this field is coming from.

@ycombinator
Copy link
Contributor Author

ycombinator commented Jul 21, 2020

Also, I just realized we're going to run into a blocker eventually. For this POC we can alias the top-level node_stats.* fields to various elasticsearch.node.stats.* fields. This will work for Elasticsearch monitoring data.

However, once we try to handle Logstash monitoring data of type: node_stats, we'll be stuck because we won't be able to also alias any Logstash node_stats.* fields with the same names as Elasticsearch node_stats.* fields to logstash.node.stats.* fields.

@ycombinator
Copy link
Contributor Author

@chrisronline I added the aliases you requested for type, source_node.uuid, and source_node.name. However, please also see my previous comment about a potential blocker: #19747 (comment).

@chrisronline
Copy link
Contributor

However, once we try to handle Logstash monitoring data of type: node_stats, we'll be stuck because we won't be able to also alias any Logstash node_stats.* fields with the same names as Elasticsearch node_stats.* fields to logstash.node.stats.* fields.

I'm confused. Logstash only has logstash_stats and logstash_state?

@ycombinator
Copy link
Contributor Author

I'm confused. Logstash only has logstash_stats and logstash_state?

Ah yes, I was mis-remembering 🤦. We should be good then. Sorry for causing confusion.

@chrisronline
Copy link
Contributor

What about timestamp? Can we alias:

timestamp -> @timestamp

@ycombinator
Copy link
Contributor Author

What about timestamp?

Yes, fixed now. Apparently I forgot to alias a bunch of top-level fields 🤦 🤦.

@chrisronline
Copy link
Contributor

I added the aliases you requested for type, source_node.uuid, and source_node.name. However, please also see my previous comment about a potential blocker: #19747 (comment).

I'm confused about the type field. I also see it in the mappings. Does changing this affect folks when they upgrade to a version of Metricbeat containing the alias? Wouldn't that error out saying the field exists?

@ycombinator
Copy link
Contributor Author

I'm confused about the type field. I also see it in the mappings. Does changing this affect folks when they upgrade to a version of Metricbeat containing the alias? Wouldn't that error out saying the field exists?

Yes, good point. I'm still trying to figure out where in the Beats codebase that field is coming from. It feels like a bug to me so perhaps we could have it removed, even in a minor. But let's assume we can't remove it. What would our alternatives be? How is the type field being used from Monitoring UI / Telemetry / Watcher queries?

@ycombinator
Copy link
Contributor Author

ycombinator commented Jul 22, 2020

I'm still trying to figure out where in the Beats codebase that field is coming from.

No action needed, just wanted to post an update about this. I've narrowed it down to Metricbeat:

$ ./filebeat export template | jq .mappings.properties.type
null
$ ./metricbeat export template | jq .mappings.properties.type
{
  "ignore_above": 1024,
  "type": "keyword"
}

Still figuring out exactly what in Metricbeat is injecting this top-level type key but at least we know it's something restricted to Metricbeat — I was worried it was being injected by the template management code in libbeat — if we decide it's a bug today and want to make a breaking change to remove it in a minor.

@ycombinator
Copy link
Contributor Author

ycombinator commented Jul 22, 2020

@chrisronline Thinking about our Zoom discussion a few minutes ago:

  • I'm thinking of converting this PR from a POC to The Real Thing (TM) but maybe targeting a feature branch instead of master so all the metricsets' changes can land at once when they are all ready. The benefit of working in a feature branch would be that we don't inadvertently roll out changes to some metricsets that we later need to roll back because of some issue we find later. It also reduces the pressure to try and get all the changes shipped in a single minor release. WDYT about that + doing the same thing with your PRs on the UI side?

  • In order to convert this PR to The Real Thing (TM) I'd need the list of fields actually being used by Stack Monitoring UI + Telemetry + Cluster Alerts. That way I can remove any fields and aliases we don't need to reduce the mapping size. Just post the list of fields here in this PR and I'll update it so you can re-test it to make sure things still work with your POC PR on the UI side.

@ycombinator
Copy link
Contributor Author

ycombinator commented Jul 22, 2020

I'd need the list of fields actually being used by Stack Monitoring UI + Telemetry + Cluster Alerts. That way I can remove any fields and aliases we don't need to reduce the mapping size.

BTW, just to stress the need to reduce the fields in the mapping to just the ones we actually need, here's a blocker we found on the Filebeat side with mappings being too large: #19965. So we'd want to try and avoid this with the Metricbeat mapping.

@chrisronline
Copy link
Contributor

@ycombinator It'd be nice to have type as an alias but we can work around it. We mainly use it in terms filter in queries but we can easily change those to use bool and should across either type or metricset.name

@ycombinator
Copy link
Contributor Author

It'd be nice to have type as an alias but we can work around it. We mainly use it in terms filter in queries but we can easily change those to use bool and should across either type or metricset.name

@chrisronline Then lets go ahead with changing the queries. Beats won't allow duplicate definitions of the same field in the mappings (makes sense!) and I don't feel comfortable redefining the existing type field to an alias as we don't know how it might be used so that could cause a breaking change.

@ycombinator ycombinator requested a review from sayden August 13, 2020 16:23
Copy link
Contributor

@sayden sayden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think data.json and data-xpack.json must be updated or removed if we use the json files in the test folder.

Apart from that, I think I pretty much nailed the idea. Good work! 😃

EDIT: Aaaand, the feature branch is missing yet 😬

@ycombinator
Copy link
Contributor Author

ycombinator commented Aug 13, 2020

@sayden Good catches re: data-xpack.json and data.json files! I've removed data-xpack.json in 61b1c81 as it no longer makes sense for the node_stats metricset. And I've updated data.json in d7066eb now that the metricset is collecting a lot more metrics.

Re: feature branch, yes, I still need to create it and re-target this PR to it. But first I want to get CI green on this PR.

@ycombinator ycombinator changed the title POC: Prepare elasticsearch/node_stats metricset for Stack Monitoring Prepare elasticsearch/node_stats metricset for Stack Monitoring Aug 13, 2020
@ycombinator ycombinator changed the title Prepare elasticsearch/node_stats metricset for Stack Monitoring Make elasticsearch/node_stats metricset work for Stack Monitoring without xpack.enabled flag Aug 13, 2020
@ycombinator
Copy link
Contributor Author

Feature branch is created: https://github.com/elastic/beats/tree/feature-stack-monitoring-mb-ecs. Will re-target this PR to it as soon as CI goes green.

@ycombinator
Copy link
Contributor Author

Travis CI failures look unrelated (generator failures) and Jenkins CI is green on this PR. Re-targeting to feature branch...

@ycombinator ycombinator changed the base branch from master to feature-stack-monitoring-mb-ecs August 14, 2020 02:32
@ycombinator
Copy link
Contributor Author

Didn't realize Jenkins CI would run again when I re-targeted the PR to the feature branch. I will let it run and, when it passes, merge this PR.

@ycombinator ycombinator requested a review from sayden August 14, 2020 03:57
@ycombinator ycombinator merged commit 575c571 into elastic:feature-stack-monitoring-mb-ecs Aug 14, 2020
@ycombinator ycombinator deleted the poc-mb-es-ecs branch August 14, 2020 03:58
@zube zube bot removed the [zube]: Done label Nov 12, 2020
leweafan pushed a commit to leweafan/beats that referenced this pull request Apr 28, 2023
…hout xpack.enabled flag (elastic#19747)

* Make node_stats ignore xpack.enabled: true for the POC

* Index more fields

* WIP: add fields to fields.yml

* Fleshing out alias fields

* Adding more fields to mapping

* Marking optional fields

* Fixing datatypes

* Fixing formatting

* Fixing alias field definitions

* More field fixes

* Making cgroups metrics collection optional

* Marking os.load_avg as optional for Windows

* Aliasing type => metricset.name

* Aliasing source_node fields

* Aliasing timestamp => @timestamp

* Removing field alias mappings for unused fields

* Removing unnecessary fields from mapping

* Updating generated files

* Reducing visibility of isMaster function

* Adding methods to metricset

* Refactoring for testability

* Fixing formatting

* Making indices.bulk and thread_pool.write fields optional for BC

* Deleting sample file from unsupported ES version

* Remove xpack.enabled code path!!!

* Updating xpack unit test

* Updating python system tests for xpack

* Remove top-level type field

* Removing data-xpack.json

* Updating data.json

* Not collecting unmapped fields

* Fixing formatting
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team:Services (Deprecated) Label for the former Integrations-Services team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants