[Metricbeat] Grouping Windows Perfmon Metrics in Events #6584

andrewkroh · 2018-03-16T21:54:01Z

The perfmon metricset generates one event per counter instance. It would be nice to offer more flexibility in grouping related metrics into a single event.

Having a single event per metric was the simplest implementation that would allow similar metrics to be grouped and visualized (e.g. visualize disk write times for each disk instance on the same graph).

It would be nice to be able to group all metrics related to an instance of an object (e.g. all metrics for C:\ or all metrics for processor 0). Here are some examples that show my idea.

metricbeat.modules:
- module: windows
  metricsets: [perfmon]
  perfmon.queries:
  - object: '\Processor'
    namespace: processor
    instance: *
    counters:
    - name: '% User Time'
      label: time.user.pct
    - name: '% Processor Time'
      label: time.processor.pct
    - name: '% Interrupt Time'
      label: time.interrupt.pct

  - object: '\UDPv4'
    namespace: udpv4
    counters:
    - name: 'Datagrams Received/sec'
      label: packets.received.per_sec
    - name: 'Datagrams Received Errors'
      label: packets.received.errors
    - name: 'Datagrams No Port/sec'
      label: packets.received.no_port_per_sec
    - name: 'Datagrams Sent/sec'
      label: packets.sent.per_sec
    - name: 'Datagrams/sec'
      label: packets.per_sec

The first query uses a Performance Data Helper (PDH) path of \Processor(*)\<counter name>. And would produce an event like

{
  "windows": {
    "perfmon": {
      "processor": {
        "instance": "_Total",
        "time": {
          "user": {
            "pct": 1.2
          },
          "processor": {
            "pct": 10.1
          },
          "interrupt": {
            "pct": 0.5
          }
        }
      }
    }
  }
}

The second query uses a PDH path of \UDPv4\<counter name> (it has no instance) and produces an event like

{
  "windows": {
    "perfmon": {
      "udpv4": {
        "packets": {
          "received": {
            "per_sec": 18,
            "errors": 1,
            "no_port_per_sec": 11
          },
          "sent": {
            "per_sec": 12
          },
          "per_sec": 32
        }
      }
    }
  }
}

This would also resolve #6528.

Related Info

Working with Performance Counters: https://technet.microsoft.com/en-us/library/bb734903.aspx?f=255&MSPPError=-2147217396

The text was updated successfully, but these errors were encountered:

ruflin · 2018-03-19T06:55:35Z

I really like this idea that we allow the user to build his own events. Reminds me also of #6462 where this could also be a potential solution (@jsoriano ).

andrewkroh · 2018-04-11T18:50:37Z

Related #4944.

willemdh · 2018-06-01T08:09:32Z

We'd love to see this functionality. The way the perfmon module works atm is not very efficient and generates a huge amount of documents with very little information. For example try gather some diskio related counters on 1k+ Windows servers every 10s:

    - instance_label: "diskio.name"
      measurement_label: "diskio.reads"
      query: '\LogicalDisk(*)\Disk Reads/sec'
      format: "long"
    - instance_label: "diskio.name"
      measurement_label: "diskio.writes"
      query: '\LogicalDisk(*)\Disk Writes/sec'
      format: "long"
    - instance_label: "diskio.name"
      measurement_label: "diskio.read.queue_length"
      query: '\LogicalDisk(*)\Avg. Disk Read Queue Length'
    - instance_label: "diskio.name"
      measurement_label: "diskio.write.queue_length"
      query: '\LogicalDisk(*)\Avg. Disk Write Queue Length'
    - instance_label: "diskio.name"
      measurement_label: "diskio.read.time.pct"
      query: '\LogicalDisk(*)\% Disk Read Time'
    - instance_label: "diskio.name"
      measurement_label: "diskio.write.time.pct"
      query: '\LogicalDisk(*)\% Disk Write Time'
    - instance_label: "diskio.name"
      measurement_label: "diskio.bytes_per_read.avg"
      query: '\LogicalDisk(*)\Avg. Disk Bytes/Read'
      format: "long"
    - instance_label: "diskio.name"
      measurement_label: "diskio.bytes_per_write.avg"
      query: '\LogicalDisk(*)\Avg. Disk Bytes/Write'
      format: "long"
    - instance_label: "diskio.name"
      measurement_label: "diskio.read.bytes_per_sec"
      query: '\LogicalDisk(*)\Disk Read Bytes/sec'
      format: "long"
    - instance_label: "diskio.name"
      measurement_label: "diskio.write.bytes_per_sec"
      query: '\LogicalDisk(*)\Disk Write Bytes/sec'
      format: "long"

Check this graph:

This is for only the above perfmon counters on 6 Windows servers.

willemdh · 2018-07-26T14:41:52Z

@ruflin @andrewkroh

Just wanted to add that I have a feeling that the millions of documents the perfmon module is generating results in very slow recovery of metricbeat indices. I added the diskio metrics from my previous post on +- 600 servers and have seen a significant detoriation of performance during recovery. (it could of course be related to other things in my cluster, but still wanted to mention this, maybe other customers are seeing the same)

martinscholz83 · 2018-08-09T07:53:19Z

Hey @andrewkroh, is someone working on this topic?

ruflin · 2018-08-09T08:40:43Z

AFAIK nobody is working on it at the moment.

martinscholz83 · 2018-08-09T09:17:53Z

Ok. I like the idea to group events in a namespace. If ok then i would open a PR.

ruflin · 2018-08-09T09:20:36Z

Of course, that would be great.

martinscholz83 · 2018-08-10T06:42:12Z

@andrewkroh, if you want to collect for multiple instances you want to do it this way?

metricbeat.modules:
- module: windows
  metricsets: [perfmon]
  perfmon.queries:
  - object: '\Processor'
    namespace: processor
    - instance: 0
      counters:
      - name: '% User Time'
        label: time.user.pct      
      - name: '% Interrupt Time'
        label: time.interrupt.pct
    - instance: 1
      counters:
      - name: '% Processor Time'
        label: time.processor.pct

  - object: '\UDPv4'
    namespace: udpv4
    counters:
    - name: 'Datagrams Received/sec'
      label: packets.received.per_sec
    - name: 'Datagrams Received Errors'
      label: packets.received.errors
    - name: 'Datagrams No Port/sec'
      label: packets.received.no_port_per_sec
    - name: 'Datagrams Sent/sec'
      label: packets.sent.per_sec
    - name: 'Datagrams/sec'
      label: packets.per_sec

willemdh · 2018-08-10T07:31:11Z

Maybe while this is being rewritten, we should consider a perfmon ecs object? It would be nice if there is some sort of convention for perfmon data, so that everyone is using the same field names?

ruflin · 2018-08-13T07:56:34Z

@willemdh Not sure if ECS should have something specific to perfmon. Perfmon can use ECS fields but there are lots of metrics which I would not expect to be in ECS in perfmon. This is not only related to perfmon but metrics in general.

andrewkroh · 2018-08-13T16:42:20Z

@andrewkroh, if you want to collect for multiple instances you want to do it this way?

To avoid duplication of the counters configuration I think instance should be able to accept a single string or a list.

metricbeat.modules:
- module: windows
  metricsets: [perfmon]
  perfmon.queries:
  - object: '\Processor'
    namespace: processor
    - instance: [0, 1]       # Allow both a string or []string.
      counters:
      - name: '% User Time'
        label: time.user.pct      
      - name: '% Interrupt Time'
        label: time.interrupt.pct

Sialagio · 2018-10-05T09:10:16Z

Hello folks,

+1 on the requirement, but ...

When checking around, system.filesystem does not return all mount points of Windows.
For this reason, we used the perf counter.
Drawback is that events are independent, meaning that we need 2 events to get both percentage + Free MegaBytes.

Three options here :

Merge both events when using logstash, but it is a really dirty solution.
Update system.filesystem to be compatible with all mounts on Windows
Do this implementation to merge events as in system.filesystem and keep using perfmon

What would be the best ?

elasticmachine · 2018-11-29T08:58:06Z

Pinging @elastic/infrastructure

The perfmon metricset is still in Beta. There are a few improvements I would like to see like elastic#6584 before pushing this to GA. These changes could be breaking changes.

The perfmon metricset is still in Beta. There are a few improvements I would like to see like #6584 before pushing this to GA. These changes could be breaking changes.

The perfmon metricset is still in Beta. There are a few improvements I would like to see like elastic#6584 before pushing this to GA. These changes could be breaking changes. (cherry picked from commit dbfa3ef)

The perfmon metricset is still in Beta. There are a few improvements I would like to see like #6584 before pushing this to GA. These changes could be breaking changes. (cherry picked from commit dbfa3ef)

#8688) This flag will send all perfmon measurements with a matching instance label as part of the same event (i.e. all metrics for C:, Processor X, etc.). This addresses some of the issues raised in #6584. In most cases enabling this flag considerably reduces the number of events sent by metricbeat.

elastic#8688) This flag will send all perfmon measurements with a matching instance label as part of the same event (i.e. all metrics for C:, Processor X, etc.). This addresses some of the issues raised in elastic#6584. In most cases enabling this flag considerably reduces the number of events sent by metricbeat.

#11002) This flag will send all perfmon measurements with a matching instance label as part of the same event (i.e. all metrics for C:, Processor X, etc.). This addresses some of the issues raised in #6584. In most cases enabling this flag considerably reduces the number of events sent by metricbeat. Co-Authored-By: Josh Smith <j_smith95@live.com>

vbohata · 2019-05-23T23:24:08Z

I think the best would be to generate single event per performance counter value and use pre-defined field names so the result could look like this:
{
"windows": {
"perfmon": {
"category" : ".NET CLR Exceptions",
"instance" : "??APP_CLR_PROC??",
"name" : "my_counter_name",
"value" : 0.0
}
}
}

There are many advantages:

Allows term aggregations per category, name ...
Allows easy filtering results for known category, name, ... No need for searching available fields.
Avoids possible huge number of dynamic fields.
With a little modification can be part of the ECS.
In environments/companies with mix of metricbeat, 3rd party and custom log shippers allows easily mixing the data searching via Kibana.

narph · 2020-04-21T14:52:45Z

Based on multiple requests we have worked on a new config format and event output that should satisfy most of the proposed options here, will close the issue for now, if there are any questions, please reopen and resume the conversation. (referred PR #17596)

andrewkroh added enhancement Metricbeat Metricbeat labels Mar 16, 2018

ruflin mentioned this issue Mar 22, 2018

Feature request: Ability to define multiple paths in HTTP module of metricbeat #6618

Closed

ruflin mentioned this issue Apr 11, 2018

Problem with aggregation of Performance Counters within same category #4944

Closed

ramenjosh mentioned this issue Oct 23, 2018

Add group_measurements_by_instance_label flag to perfmon configuration #8688

Merged

ruflin added the Team:Integrations Label for the Integrations team label Nov 29, 2018

ruflin added module :Windows labels Nov 29, 2018

ruflin added a commit to ruflin/beats that referenced this issue Jan 18, 2019

Making the Windows module GA

b3826c2

The perfmon metricset is still in Beta. There are a few improvements I would like to see like elastic#6584 before pushing this to GA. These changes could be breaking changes.

ruflin mentioned this issue Jan 18, 2019

Making the Windows module GA #10163

Merged

ruflin added a commit that referenced this issue Jan 21, 2019

Making the Windows module GA (#10163)

dbfa3ef

The perfmon metricset is still in Beta. There are a few improvements I would like to see like #6584 before pushing this to GA. These changes could be breaking changes.

ruflin mentioned this issue Jan 21, 2019

Cherry-pick #10163 to 6.x: Making the Windows module GA #10221

Merged

alvarolobato added the [zube]: Ready label Apr 5, 2019

vbohata mentioned this issue May 23, 2019

Perfmon should generate unified static layout events for each performance monitor #12262

Closed

andresrc added [zube]: Backlog and removed [zube]: Ready labels Jul 22, 2019

narph mentioned this issue Apr 8, 2020

Add improved config/event output options to windows/perfmon metricset #17596

Merged

9 tasks

narph mentioned this issue Apr 21, 2020

Cherry-pick #17596 to 7.x: Add improved config/event output options to windows/perfmon metricset #17861

Merged

9 tasks

narph closed this as completed Apr 21, 2020

zube bot added [zube]: Done and removed [zube]: Backlog labels Apr 21, 2020

andresrc removed the [zube]: Done label Apr 27, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Metricbeat] Grouping Windows Perfmon Metrics in Events #6584

[Metricbeat] Grouping Windows Perfmon Metrics in Events #6584

andrewkroh commented Mar 16, 2018 •

edited

Loading

ruflin commented Mar 19, 2018

andrewkroh commented Apr 11, 2018

willemdh commented Jun 1, 2018 •

edited

Loading

willemdh commented Jul 26, 2018

martinscholz83 commented Aug 9, 2018

ruflin commented Aug 9, 2018

martinscholz83 commented Aug 9, 2018

ruflin commented Aug 9, 2018

martinscholz83 commented Aug 10, 2018

willemdh commented Aug 10, 2018

ruflin commented Aug 13, 2018

andrewkroh commented Aug 13, 2018

Sialagio commented Oct 5, 2018

elasticmachine commented Nov 29, 2018

vbohata commented May 23, 2019

narph commented Apr 21, 2020 •

edited

Loading

[Metricbeat] Grouping Windows Perfmon Metrics in Events #6584

[Metricbeat] Grouping Windows Perfmon Metrics in Events #6584

Comments

andrewkroh commented Mar 16, 2018 • edited Loading

Related Info

ruflin commented Mar 19, 2018

andrewkroh commented Apr 11, 2018

willemdh commented Jun 1, 2018 • edited Loading

willemdh commented Jul 26, 2018

martinscholz83 commented Aug 9, 2018

ruflin commented Aug 9, 2018

martinscholz83 commented Aug 9, 2018

ruflin commented Aug 9, 2018

martinscholz83 commented Aug 10, 2018

willemdh commented Aug 10, 2018

ruflin commented Aug 13, 2018

andrewkroh commented Aug 13, 2018

Sialagio commented Oct 5, 2018

elasticmachine commented Nov 29, 2018

vbohata commented May 23, 2019

narph commented Apr 21, 2020 • edited Loading

andrewkroh commented Mar 16, 2018 •

edited

Loading

willemdh commented Jun 1, 2018 •

edited

Loading

narph commented Apr 21, 2020 •

edited

Loading