Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make elasticsearch/ccr metricset work for Stack Monitoring without xpack.enabled flag #21348

Conversation

sayden
Copy link
Contributor

@sayden sayden commented Sep 28, 2020

Ready to test in Kibana. data.json is included to help troubleshooting and it has been generating testing the Metricbeat binary directly with Elasticsearch

@sayden sayden added Metricbeat Metricbeat Feature:Stack Monitoring Team:Services (Deprecated) Label for the former Integrations-Services team labels Sep 28, 2020
@sayden sayden self-assigned this Sep 28, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/stack-monitoring (Stack monitoring)

@elasticmachine
Copy link
Collaborator

Pinging @elastic/integrations-services (Team:Services)

@botelastic botelastic bot added needs_team Indicates that the issue/PR needs a Team:* label and removed needs_team Indicates that the issue/PR needs a Team:* label labels Sep 28, 2020
@elasticmachine
Copy link
Collaborator

elasticmachine commented Sep 28, 2020

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview

Expand to view the summary

Build stats

  • Build Cause: [Pull request #21348 updated]

  • Start Time: 2020-11-20T09:06:18.318+0000

  • Duration: 59 min 36 sec

Test stats 🧪

Test Results
Failed 0
Passed 2256
Skipped 518
Total 2774

💚 Flaky test report

Tests succeeded.

Expand to view the summary

Test stats 🧪

Test Results
Failed 0
Passed 2256
Skipped 518
Total 2774

@chrisronline
Copy link
Contributor

I'm not seeing these stats come through.

I'm running this PR and this query only returns node_stats:

POST metricbeat-*/_search?filter_path=aggregations.types.buckets
{
  "size": 0,
  "aggs": {
    "types": {
      "terms": {
        "field": "metricset.name",
        "size": 10
      },
      "aggs": {
        "top": {
          "top_hits": {
            "size": 1,
            "sort": [
              {
                "@timestamp": "desc"
              }
            ],
            "_source": "@timestamp"
          }
        }
      }
    }
  }
}

->

{
  "aggregations" : {
    "types" : {
      "buckets" : [
        {
          "key" : "node_stats",
          "doc_count" : 13,
          "top" : {
            "hits" : {
              "total" : {
                "value" : 13,
                "relation" : "eq"
              },
              "max_score" : null,
              "hits" : [
                {
                  "_index" : "metricbeat-8.0.0-2020.10.19-000001",
                  "_id" : "MZGvQXUBfqHoUydubh4l",
                  "_score" : null,
                  "_source" : {
                    "@timestamp" : "2020-10-19T16:26:56.399Z"
                  },
                  "sort" : [
                    1603124816399
                  ]
                }
              ]
            }
          }
        }
      ]
    }
  }
}

@sayden
Copy link
Contributor Author

sayden commented Oct 28, 2020

Is it possible that this is because the Elasticsearch node is not setup with CCR? The response from Elasticsearch is almost empty if not:

{
  "auto_follow_stats": {
    "number_of_failed_follow_indices": 0,
    "number_of_failed_remote_cluster_state_requests": 0,
    "number_of_successful_follow_indices": 0,
    "recent_auto_follow_errors": [],
    "auto_followed_clusters": []
  },
  "follow_stats": {
    "indices": []
  }
}

And the output in the metricset will probably be empty too. Is it expected to return something else? By looking at the code, it doesn't seem so.

@chrisronline
Copy link
Contributor

Thanks @sayden! It must have been my mistake. I'm seeing it now!

Here is the query we run from Kibana:

{
  "size": 10000,
  "sort": [
    {
      "timestamp": {
        "order": "desc",
        "unmapped_type": "long"
      }
    }
  ],
  "query": {
    "bool": {
      "must": [
        {
          "term": {
            "type": {
              "value": "ccr_stats"
            }
          }
        },
        {
          "range": {
            "timestamp": {
              "format": "epoch_millis",
              "gte": 1603973755830,
              "lte": 1603977355830
            }
          }
        }
      ]
    }
  },
  "collapse": {
    "field": "ccr_stats.follower_index",
    "inner_hits": {
      "name": "by_shard",
      "sort": [
        {
          "timestamp": {
            "order": "desc",
            "unmapped_type": "long"
          }
        }
      ],
      "size": 10000,
      "collapse": {
        "field": "ccr_stats.shard_id"
      }
    }
  },
  "aggs": {
    "by_follower_index": {
      "terms": {
        "field": "ccr_stats.follower_index",
        "size": 10000
      },
      "aggs": {
        "leader_index": {
          "terms": {
            "field": "ccr_stats.leader_index",
            "size": 1
          },
          "aggs": {
            "remote_cluster": {
              "terms": {
                "field": "ccr_stats.remote_cluster",
                "size": 1
              }
            }
          }
        },
        "by_shard_id": {
          "terms": {
            "field": "ccr_stats.shard_id",
            "size": 10
          },
          "aggs": {
            "ops_synced_max": {
              "max": {
                "field": "ccr_stats.operations_written"
              }
            },
            "ops_synced_min": {
              "min": {
                "field": "ccr_stats.operations_written"
              }
            },
            "lag_ops_leader_max": {
              "max": {
                "field": "ccr_stats.leader_max_seq_no"
              }
            },
            "lag_ops_leader_min": {
              "min": {
                "field": "ccr_stats.leader_max_seq_no"
              }
            },
            "lag_ops_global_max": {
              "max": {
                "field": "ccr_stats.follower_global_checkpoint"
              }
            },
            "lag_ops_global_min": {
              "min": {
                "field": "ccr_stats.follower_global_checkpoint"
              }
            },
            "leader_lag_ops_checkpoint_max": {
              "max": {
                "field": "ccr_stats.leader_global_checkpoint"
              }
            },
            "leader_lag_ops_checkpoint_min": {
              "min": {
                "field": "ccr_stats.leader_global_checkpoint"
              }
            },
            "ops_synced": {
              "bucket_script": {
                "buckets_path": {
                  "max": "ops_synced_max",
                  "min": "ops_synced_min"
                },
                "script": "params.max - params.min"
              }
            },
            "lag_ops_leader": {
              "bucket_script": {
                "buckets_path": {
                  "max": "lag_ops_leader_max",
                  "min": "lag_ops_leader_min"
                },
                "script": "params.max - params.min"
              }
            },
            "lag_ops_global": {
              "bucket_script": {
                "buckets_path": {
                  "max": "lag_ops_global_max",
                  "min": "lag_ops_global_min"
                },
                "script": "params.max - params.min"
              }
            },
            "lag_ops": {
              "bucket_script": {
                "buckets_path": {
                  "max": "lag_ops_leader",
                  "min": "lag_ops_global"
                },
                "script": "params.max - params.min"
              }
            },
            "lag_ops_leader_checkpoint": {
              "bucket_script": {
                "buckets_path": {
                  "max": "leader_lag_ops_checkpoint_max",
                  "min": "leader_lag_ops_checkpoint_min"
                },
                "script": "params.max - params.min"
              }
            },
            "leader_lag_ops": {
              "bucket_script": {
                "buckets_path": {
                  "max": "lag_ops_leader",
                  "min": "lag_ops_leader_checkpoint"
                },
                "script": "params.max - params.min"
              }
            },
            "follower_lag_ops": {
              "bucket_script": {
                "buckets_path": {
                  "max": "lag_ops_leader_checkpoint",
                  "min": "lag_ops_global"
                },
                "script": "params.max - params.min"
              }
            }
          }
        }
      }
    }
  }
}

I'm not seeing any aliases currently setup so I'm getting this failure:

"no mapping found for ccr_stats.follower_index in order to collapse on"

@sayden
Copy link
Contributor Author

sayden commented Nov 3, 2020

Chris, I don't know of any ccr_stats mapping because it's not in the mapping reference you sent me. Do you have any reference to write all aliases?

@chrisronline
Copy link
Contributor

@sayden Yea I'm not sure why it's not showing up in the mapping file, so I apologize there.

Here is the full list of fields

Let's make sure all of those are in the document, but you need to map:

  • ccr_stats.follower_index
  • ccr_stats.shard_id
  • ccr_stats.leader_index
  • ccr_stats.remote_cluster
  • ccr_stats.shard_id'

@sayden sayden force-pushed the feature/mb/elasticsearch/ccr_xpack_flag branch from a6315b8 to 4da617e Compare November 6, 2020 16:14
@sayden
Copy link
Contributor Author

sayden commented Nov 6, 2020

Okay @chrisronline I have added all the fields in the list of your link but I still couldn't generate a data.json without setting up a CCR cluster so I'd expect some mapping errors like the ones you have seen in other Metricsets.

It was a bit tricky to add all those fields so just copy-paste here if you see any issue. Thanks!

@elasticmachine
Copy link
Collaborator

elasticmachine commented Nov 6, 2020

🐛 Flaky test report

❕ There are test failures but not known flaky tests.

Expand to view the summary

Test stats 🧪

Test Results
Failed 2
Passed 2248
Skipped 500
Total 2750

Genuine test errors 2

💔 There are test failures but not known flaky tests, most likely a genuine test failure.

  • Name: Build&Test / x-pack/metricbeat-build / test_migration – x-pack.metricbeat.tests.system.test_xpack_base.Test
  • Name: Build&Test / x-pack/metricbeat-build / test_template – x-pack.metricbeat.tests.system.test_xpack_base.Test

Copy link
Contributor

@chrisronline chrisronline left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@sayden sayden force-pushed the feature/mb/elasticsearch/ccr_xpack_flag branch from 9519820 to 1710b82 Compare November 10, 2020 13:03
@sayden
Copy link
Contributor Author

sayden commented Nov 11, 2020

@ycombinator I have added on the mapping the ton of fields that are required for CI to pass. I'm thinking to remove many fields from all metricsets once we have the feature branch ready to merge and the one of kibana so that I can also do some testing on my local. WDYT?

@ycombinator
Copy link
Contributor

I have added on the mapping the ton of fields that are required for CI to pass. I'm thinking to remove many fields from all metricsets once we have the feature branch ready to merge and the one of kibana so that I can also do some testing on my local.

I didn't quite understand this, sorry. Why are there some extra fields that are needed now for CI to pass but can be removed later? How will we know which fields these are exactly so we can remove them all later without missing any? What would happen if we removed them now itself - why would CI start failing?

IOW, I'm trying to understand why this PR can't contain exactly those fields that were already present before this PR (this way we don't introduce breaking changes) + newer fields needed by the Stack Monitoring UI. Why does CI fail with only these fields?

@sayden
Copy link
Contributor Author

sayden commented Nov 12, 2020

Filebeat and Metricbeat have some internal test called assert_fields_are_documented https://github.com/elastic/beats/blob/master/libbeat/tests/system/beat/beat.py#L696 which is executed on the module checking that all fields without exception present in an event are also present in the fields.yml, in that order. I mean if a field exists in the event, it must exists on the mapping too (it doesn't check if it's in the mapping so it must be in the event)

I think this test was being launched against the non x-pack flow only. Now that the module has all the fields from x-pack, the test is complaining and that's why I have to add them.

The reason to add them all now and remove the ones that are not necessary later is because it's easier to give everything to Chris now so that he can work on his side and once everything is done, and before merging into master, I can play on my own with @chrisronline Kibana's branch and Beats feature branch removing fields while testing on Kibana by myself. It's double the work for me but I think it's more reasonable than removing some fields, then bothering Chris to check, then removing more until something gets broken and then revert. That would be slower and I can remove while testing in a single PR later.

Also, most metricsets just have an schema to "apply" so it's easy to remove fields from there and from the fields.yml and then test with the finished Kibana branch to see if something gets broken.

@sayden sayden force-pushed the feature-stack-monitoring-mb-ecs branch from 58e2a7d to 9de5959 Compare November 12, 2020 11:44
# Conflicts:
#	metricbeat/docs/fields.asciidoc
#	metricbeat/module/elasticsearch/fields.go
#	metricbeat/module/elasticsearch/node_stats/_meta/fields.yml
@sayden sayden force-pushed the feature/mb/elasticsearch/ccr_xpack_flag branch from a22d3a9 to 27281b5 Compare November 12, 2020 11:46
@sayden sayden force-pushed the feature/mb/elasticsearch/ccr_xpack_flag branch from 27281b5 to bb431af Compare November 12, 2020 11:56
@ycombinator
Copy link
Contributor

The reason to add them all now and remove the ones that are not necessary later is because it's easier to give everything to Chris now so that he can work on his side and once everything is done, and before merging into master, I can play on my own with @chrisronline Kibana's branch and Beats feature branch removing fields while testing on Kibana by myself. It's double the work for me but I think it's more reasonable than removing some fields, then bothering Chris to check, then removing more until something gets broken and then revert. That would be slower and I can remove while testing in a single PR later.

Thanks for explaining. This approach makes sense to me.

My main concern is that we'll end up collecting/indexing too many fields, which brings a cost to it (e.g. we're starting to see some support issues with the logstash module consuming too much memory, and one mitigation is to ensure it only collects the fields we actually need in the UI/Telemetry: #22370).

So I'm good with deferring this "tuning" to the end — after all PRs are merged into the feature branch but before the feature branch is merged into master — for the reasons you mentioned. But at the same time, lets not forget to do this 🙂.

@@ -1,39 +0,0 @@
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sayden Looks like this file got deleted. Any chance you could regenerate it? The other Stack Monitoring PRs have it so it would be good to have it in this PR too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. I have added a TestData function to work with mock input data and now we have a data.json 🎉

Copy link
Contributor

@ycombinator ycombinator left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just left one comment requesting to generate and add data.json to this PR.

@sayden
Copy link
Contributor Author

sayden commented Nov 18, 2020

jenkins test this

@sayden
Copy link
Contributor Author

sayden commented Nov 18, 2020

/test metricbeat

@sayden
Copy link
Contributor Author

sayden commented Nov 18, 2020

Okay! CI is green so I guess that it's ready or I'm close. @ycombinator can you take a look when you have some time, please? 🙂

Copy link
Contributor

@ycombinator ycombinator left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

…csearch/ccr_xpack_flag

# Conflicts:
#	metricbeat/module/elasticsearch/fields.go
@sayden
Copy link
Contributor Author

sayden commented Nov 19, 2020

/test metricbeat

@sayden
Copy link
Contributor Author

sayden commented Nov 20, 2020

Green CI again finally! Merging!

@sayden sayden merged commit d2d6856 into elastic:feature-stack-monitoring-mb-ecs Nov 20, 2020
leweafan pushed a commit to leweafan/beats that referenced this pull request Apr 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature:Stack Monitoring Metricbeat Metricbeat Team:Services (Deprecated) Label for the former Integrations-Services team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants