Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Metricbeat] Grouping Windows Perfmon Metrics in Events #6584

Closed
andrewkroh opened this issue Mar 16, 2018 · 16 comments
Closed

[Metricbeat] Grouping Windows Perfmon Metrics in Events #6584

andrewkroh opened this issue Mar 16, 2018 · 16 comments
Labels

Comments

@andrewkroh
Copy link
Member

andrewkroh commented Mar 16, 2018

The perfmon metricset generates one event per counter instance. It would be nice to offer more flexibility in grouping related metrics into a single event.

Having a single event per metric was the simplest implementation that would allow similar metrics to be grouped and visualized (e.g. visualize disk write times for each disk instance on the same graph).

It would be nice to be able to group all metrics related to an instance of an object (e.g. all metrics for C:\ or all metrics for processor 0). Here are some examples that show my idea.

metricbeat.modules:
- module: windows
  metricsets: [perfmon]
  perfmon.queries:
  - object: '\Processor'
    namespace: processor
    instance: *
    counters:
    - name: '% User Time'
      label: time.user.pct
    - name: '% Processor Time'
      label: time.processor.pct
    - name: '% Interrupt Time'
      label: time.interrupt.pct

  - object: '\UDPv4'
    namespace: udpv4
    counters:
    - name: 'Datagrams Received/sec'
      label: packets.received.per_sec
    - name: 'Datagrams Received Errors'
      label: packets.received.errors
    - name: 'Datagrams No Port/sec'
      label: packets.received.no_port_per_sec
    - name: 'Datagrams Sent/sec'
      label: packets.sent.per_sec
    - name: 'Datagrams/sec'
      label: packets.per_sec

The first query uses a Performance Data Helper (PDH) path of \Processor(*)\<counter name>. And would produce an event like

{
  "windows": {
    "perfmon": {
      "processor": {
        "instance": "_Total",
        "time": {
          "user": {
            "pct": 1.2
          },
          "processor": {
            "pct": 10.1
          },
          "interrupt": {
            "pct": 0.5
          }
        }
      }
    }
  }
}

The second query uses a PDH path of \UDPv4\<counter name> (it has no instance) and produces an event like

{
  "windows": {
    "perfmon": {
      "udpv4": {
        "packets": {
          "received": {
            "per_sec": 18,
            "errors": 1,
            "no_port_per_sec": 11
          },
          "sent": {
            "per_sec": 12
          },
          "per_sec": 32
        }
      }
    }
  }
}

This would also resolve #6528.

Related Info

@ruflin
Copy link
Contributor

ruflin commented Mar 19, 2018

I really like this idea that we allow the user to build his own events. Reminds me also of #6462 where this could also be a potential solution (@jsoriano ).

@andrewkroh
Copy link
Member Author

Related #4944.

@willemdh
Copy link

willemdh commented Jun 1, 2018

We'd love to see this functionality. The way the perfmon module works atm is not very efficient and generates a huge amount of documents with very little information. For example try gather some diskio related counters on 1k+ Windows servers every 10s:

    - instance_label: "diskio.name"
      measurement_label: "diskio.reads"
      query: '\LogicalDisk(*)\Disk Reads/sec'
      format: "long"
    - instance_label: "diskio.name"
      measurement_label: "diskio.writes"
      query: '\LogicalDisk(*)\Disk Writes/sec'
      format: "long"
    - instance_label: "diskio.name"
      measurement_label: "diskio.read.queue_length"
      query: '\LogicalDisk(*)\Avg. Disk Read Queue Length'
    - instance_label: "diskio.name"
      measurement_label: "diskio.write.queue_length"
      query: '\LogicalDisk(*)\Avg. Disk Write Queue Length'
    - instance_label: "diskio.name"
      measurement_label: "diskio.read.time.pct"
      query: '\LogicalDisk(*)\% Disk Read Time'
    - instance_label: "diskio.name"
      measurement_label: "diskio.write.time.pct"
      query: '\LogicalDisk(*)\% Disk Write Time'
    - instance_label: "diskio.name"
      measurement_label: "diskio.bytes_per_read.avg"
      query: '\LogicalDisk(*)\Avg. Disk Bytes/Read'
      format: "long"
    - instance_label: "diskio.name"
      measurement_label: "diskio.bytes_per_write.avg"
      query: '\LogicalDisk(*)\Avg. Disk Bytes/Write'
      format: "long"
    - instance_label: "diskio.name"
      measurement_label: "diskio.read.bytes_per_sec"
      query: '\LogicalDisk(*)\Disk Read Bytes/sec'
      format: "long"
    - instance_label: "diskio.name"
      measurement_label: "diskio.write.bytes_per_sec"
      query: '\LogicalDisk(*)\Disk Write Bytes/sec'
      format: "long"

Check this graph:

image

This is for only the above perfmon counters on 6 Windows servers.

@willemdh
Copy link

@ruflin @andrewkroh

Just wanted to add that I have a feeling that the millions of documents the perfmon module is generating results in very slow recovery of metricbeat indices. I added the diskio metrics from my previous post on +- 600 servers and have seen a significant detoriation of performance during recovery. (it could of course be related to other things in my cluster, but still wanted to mention this, maybe other customers are seeing the same)

@martinscholz83
Copy link
Contributor

Hey @andrewkroh, is someone working on this topic?

@ruflin
Copy link
Contributor

ruflin commented Aug 9, 2018

AFAIK nobody is working on it at the moment.

@martinscholz83
Copy link
Contributor

Ok. I like the idea to group events in a namespace. If ok then i would open a PR.

@ruflin
Copy link
Contributor

ruflin commented Aug 9, 2018

Of course, that would be great.

@martinscholz83
Copy link
Contributor

@andrewkroh, if you want to collect for multiple instances you want to do it this way?

metricbeat.modules:
- module: windows
  metricsets: [perfmon]
  perfmon.queries:
  - object: '\Processor'
    namespace: processor
    - instance: 0
      counters:
      - name: '% User Time'
        label: time.user.pct      
      - name: '% Interrupt Time'
        label: time.interrupt.pct
    - instance: 1
      counters:
      - name: '% Processor Time'
        label: time.processor.pct

  - object: '\UDPv4'
    namespace: udpv4
    counters:
    - name: 'Datagrams Received/sec'
      label: packets.received.per_sec
    - name: 'Datagrams Received Errors'
      label: packets.received.errors
    - name: 'Datagrams No Port/sec'
      label: packets.received.no_port_per_sec
    - name: 'Datagrams Sent/sec'
      label: packets.sent.per_sec
    - name: 'Datagrams/sec'
      label: packets.per_sec

@willemdh
Copy link

Maybe while this is being rewritten, we should consider a perfmon ecs object? It would be nice if there is some sort of convention for perfmon data, so that everyone is using the same field names?

@ruflin
Copy link
Contributor

ruflin commented Aug 13, 2018

@willemdh Not sure if ECS should have something specific to perfmon. Perfmon can use ECS fields but there are lots of metrics which I would not expect to be in ECS in perfmon. This is not only related to perfmon but metrics in general.

@andrewkroh
Copy link
Member Author

@andrewkroh, if you want to collect for multiple instances you want to do it this way?

To avoid duplication of the counters configuration I think instance should be able to accept a single string or a list.

metricbeat.modules:
- module: windows
  metricsets: [perfmon]
  perfmon.queries:
  - object: '\Processor'
    namespace: processor
    - instance: [0, 1]       # Allow both a string or []string.
      counters:
      - name: '% User Time'
        label: time.user.pct      
      - name: '% Interrupt Time'
        label: time.interrupt.pct

@Sialagio
Copy link

Sialagio commented Oct 5, 2018

Hello folks,

+1 on the requirement, but ...

When checking around, system.filesystem does not return all mount points of Windows.
For this reason, we used the perf counter.
Drawback is that events are independent, meaning that we need 2 events to get both percentage + Free MegaBytes.

Three options here :

  • Merge both events when using logstash, but it is a really dirty solution.
  • Update system.filesystem to be compatible with all mounts on Windows
  • Do this implementation to merge events as in system.filesystem and keep using perfmon

What would be the best ?

@elasticmachine
Copy link
Collaborator

Pinging @elastic/infrastructure

ruflin added a commit to ruflin/beats that referenced this issue Jan 18, 2019
The perfmon metricset is still in Beta. There are a few improvements I would like to see like elastic#6584 before pushing this to GA. These changes could be breaking changes.
ruflin added a commit that referenced this issue Jan 21, 2019
The perfmon metricset is still in Beta. There are a few improvements I would like to see like #6584 before pushing this to GA. These changes could be breaking changes.
ruflin added a commit to ruflin/beats that referenced this issue Jan 21, 2019
The perfmon metricset is still in Beta. There are a few improvements I would like to see like elastic#6584 before pushing this to GA. These changes could be breaking changes.

(cherry picked from commit dbfa3ef)
ruflin added a commit that referenced this issue Jan 22, 2019
The perfmon metricset is still in Beta. There are a few improvements I would like to see like #6584 before pushing this to GA. These changes could be breaking changes.

(cherry picked from commit dbfa3ef)
jsoriano pushed a commit that referenced this issue Feb 28, 2019
#8688)

This flag will send all perfmon measurements with a matching instance label as part of the same event (i.e. all metrics for C:, Processor X, etc.). This addresses some of the issues raised in #6584.

In most cases enabling this flag considerably reduces the number of events sent by metricbeat.
jsoriano pushed a commit to jsoriano/beats that referenced this issue Feb 28, 2019
elastic#8688)

This flag will send all perfmon measurements with a matching instance label as part of the same event (i.e. all metrics for C:, Processor X, etc.). This addresses some of the issues raised in elastic#6584.

In most cases enabling this flag considerably reduces the number of events sent by metricbeat.
jsoriano added a commit that referenced this issue Mar 1, 2019
#11002)

This flag will send all perfmon measurements with a matching instance label as part of the same event (i.e. all metrics for C:, Processor X, etc.). This addresses some of the issues raised in #6584.

In most cases enabling this flag considerably reduces the number of events sent by metricbeat.

Co-Authored-By: Josh Smith <j_smith95@live.com>
@vbohata
Copy link

vbohata commented May 23, 2019

I think the best would be to generate single event per performance counter value and use pre-defined field names so the result could look like this:
{
"windows": {
"perfmon": {
"category" : ".NET CLR Exceptions",
"instance" : "??APP_CLR_PROC??",
"name" : "my_counter_name",
"value" : 0.0
}
}
}

There are many advantages:

  1. Allows term aggregations per category, name ...
  2. Allows easy filtering results for known category, name, ... No need for searching available fields.
  3. Avoids possible huge number of dynamic fields.
  4. With a little modification can be part of the ECS.
  5. In environments/companies with mix of metricbeat, 3rd party and custom log shippers allows easily mixing the data searching via Kibana.

@narph
Copy link
Contributor

narph commented Apr 21, 2020

Based on multiple requests we have worked on a new config format and event output that should satisfy most of the proposed options here, will close the issue for now, if there are any questions, please reopen and resume the conversation. (referred PR #17596)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

10 participants