Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update the vSphere / VSAN Extention with a Variable for the Polling Intervall ( New Feature in vSAN 8 U1 , 30sec ) #13880

Closed
Muy69 opened this issue Sep 7, 2023 · 7 comments · Fixed by #13890
Labels
feature request Requests for new plugin and for new features to existing plugins

Comments

@Muy69
Copy link

Muy69 commented Sep 7, 2023

Use Case

In a vSAN ESA Cluster ( vSAN 8 U1 ) the usage of a Polling Intervall of 30Sec helps greatly in peak Situation , as the 5Minute is too long for a good Look inside the Storage Data .

https://core.vmware.com/blog/high-resolution-performance-monitoring-vsan-8-u1

Expected behavior

Some kind of variable or a Flag to determain that an vSAN 8 U1 or above is used .

Actual behavior

Currently a hard coded 300s Intervall is set. ( endpoint.go , Line 248 )

"vsan": {
name: "vsan",
vcName: "ClusterComputeResource",
pKey: "clustername",
parentTag: "dcname",
enabled: anythingEnabled(parent.VSANMetricExclude),
realTime: false,
sampling: 300,
objects: make(objectMap),
filters: newFilterOrPanic(parent.VSANMetricInclude, parent.VSANMetricExclude),
paths: parent.VSANClusterInclude,
simple: parent.VSANMetricSkipVerify,
include: parent.VSANMetricInclude,
collectInstances: false,
getObjects: getClusters,
parent: "datacenter",

Additional info

If you need testing , I would like to offer my help .

vSAN 8 u1 ( VSAN ESA )

@Muy69 Muy69 added the feature request Requests for new plugin and for new features to existing plugins label Sep 7, 2023
@srebhan
Copy link
Member

srebhan commented Sep 8, 2023

@Muy69 please test the binary in PR #13890 available as soon as CI finished its tests successfully. Let me know if this solves your issue!

@powersj powersj added the waiting for response waiting for response from contributor label Sep 8, 2023
@Muy69
Copy link
Author

Muy69 commented Sep 10, 2023

@srebhan Im having Problem in starting Telegraf . Errorcode below

# # Read metrics form VMware vCenter

[[inputs.vsphere]]
  ## List of vCenter URLs to be monitored. These three lines must be uncommented
   interval = "30s"
   vcenters = [ "" ]
   username = ""
   password = ""
   timeout = "29s"
   insecure_skip_verify =true

   # Exclude all historical metrics

   datastore_metric_exclude = ["*"]
   datacenter_metric_exclude = ["*"]
   cluster_metric_exclude = ["*"]
   resourcepool_metric_exclude = ["*"]

   vsan_metric_include = [ "summary.*" ]
   vsan_metric_exclude = [ ]
   vsan_cluster_include = ["/*/host/WLC-120"]
   #vsan_interval = "30s"

   collect_concurrency = 4
   discover_concurrency = 4

  ## The Historical Interval value must match EXACTLY the interval in the daily
  # "Interval Duration" found on the VCenter server under Configure > General > Statistics > Statistic intervals
   historical_interval = "60s"

[[inputs.vsphere]]

  interval = "60s"
  vcenters = [ "" ]
  username = ""
  password = ""
  timeout = "59s"
  insecure_skip_verify = true

  vm_metric_exclude = ["*"] # Exclude realtime metrics
  host_metric_exclude = ["*"] # Exclude realtime metrics

  vsan_metric_include = [ "performance.*" ]
  vsan_metric_exclude = [ ]
  vsan_cluster_include = ["/*/host/WLC-120" ]
  vsan_interval = "30s"

  #max_query_metrics = 256
  discover_concurrency = 4
  collect_concurrency = 4

  ## The Historical Interval value must match EXACTLY the interval in the daily
  # "Interval Duration" found on the VCenter server under Configure > General > Statistics > Statistic intervals
  historical_interval = "60s"
 Process: 2067 ExecStart=/usr/bin/telegraf -config /etc/telegraf/telegraf.conf -config-directory /etc/telegraf/telegraf.d $TELEGRAF_OPTS (code=exited, status=203/EXEC)
   Main PID: 2067 (code=exited, status=203/EXEC)
        CPU: 2ms

Sep 10 18:36:20 cidtsttele01 systemd[1]: telegraf.service: Main process exited, code=exited, status=203/EXEC
Sep 10 18:36:20 cidtsttele01 systemd[1]: telegraf.service: Failed with result 'exit-code'.
Sep 10 18:36:20 cidtsttele01 systemd[1]: Failed to start Telegraf.
Sep 10 18:36:20 cidtsttele01 systemd[1]: telegraf.service: Scheduled restart job, restart counter is at 5.
Sep 10 18:36:20 cidtsttele01 systemd[1]: Stopped Telegraf.
Sep 10 18:36:20 cidtsttele01 systemd[1]: telegraf.service: Start request repeated too quickly.
Sep 10 18:36:20 cidtsttele01 systemd[1]: telegraf.service: Failed with result 'exit-code'.
Sep 10 18:36:20 cidtsttele01 systemd[1]: Failed to start Telegraf.

@telegraf-tiger telegraf-tiger bot removed the waiting for response waiting for response from contributor label Sep 10, 2023
@Muy69
Copy link
Author

Muy69 commented Sep 15, 2023

The Config changes dont work as intended , as the Interval can be set but the Metrics still reflect only a 300 Sec Time Interval ?

@powersj
Copy link
Contributor

powersj commented Sep 15, 2023

@Muy69 please provide the actual telegraf logs. If you are running it as a service then you need to use something similar to the below:

journalctl --no-pager --unit telegraf

@Muy69
Copy link
Author

Muy69 commented Sep 15, 2023

Telegraf Version : 1.28.1-1

#journalct --no-pager --unit telegraf

Sep 15 12:33:14 cidtsttele02 telegraf[4857]: 2023-09-15T12:33:14Z E! [inputs.vsphere] [vSAN] Error querying performance data for WLC-120: cmmds-workload: ServerFaultCode: A specified parameter was not correct: entityRefId.
Sep 15 12:33:15 cidtsttele02 telegraf[4857]: 2023-09-15T12:33:15Z E! [inputs.vsphere] [vSAN] Failed to parse a timestamp: 0001-01-01 00:00:00 +0000 UTC. Skipping
Sep 15 12:33:15 cidtsttele02 telegraf[4857]: 2023-09-15T12:33:15Z E! [inputs.vsphere] [vSAN] Error querying performance data for WLC-120: host-pmem: ServerFaultCode: The operation is not supported on the object..
Sep 15 12:33:15 cidtsttele02 telegraf[4857]: 2023-09-15T12:33:15Z E! [inputs.vsphere] [vSAN] Error querying performance data for WLC-120: cluster-pmem: ServerFaultCode: The operation is not supported on the object..
Sep 15 12:33:18 cidtsttele02 telegraf[4857]: 2023-09-15T12:33:18Z E! [inputs.vsphere] [vSAN] Failed to parse a timestamp: 0001-01-01 00:00:00 +0000 UTC. Skipping
Sep 15 12:33:19 cidtsttele02 telegraf[4857]: 2023-09-15T12:33:19Z E! [inputs.vsphere] [vSAN] Failed to parse a timestamp: 0001-01-01 00:00:00 +0000 UTC. Skipping
Sep 15 12:33:20 cidtsttele02 telegraf[4857]: 2023-09-15T12:33:20Z E! [inputs.vsphere] [vSAN] Failed to parse a timestamp: 0001-01-01 00:00:00 +0000 UTC. Skipping
Sep 15 12:33:20 cidtsttele02 telegraf[4857]: 2023-09-15T12:33:20Z E! [inputs.vsphere] [vSAN] Error querying performance data for WLC-120: clom-host: ServerFaultCode: A specified parameter was not correct: entityRefId.
Sep 15 12:33:20 cidtsttele02 telegraf[4857]: 2023-09-15T12:33:20Z E! [inputs.vsphere] [vSAN] Failed to parse a timestamp: 0001-01-01 00:00:00 +0000 UTC. Skipping
Sep 15 12:33:20 cidtsttele02 telegraf[4857]: 2023-09-15T12:33:20Z E! [inputs.vsphere] [vSAN] Failed to parse a timestamp: 0001-01-01 00:00:00 +0000 UTC. Skipping
Sep 15 12:33:20 cidtsttele02 telegraf[4857]: 2023-09-15T12:33:20Z E! [inputs.vsphere] [vSAN] Failed to parse a timestamp: 0001-01-01 00:00:00 +0000 UTC. Skipping
Sep 15 12:33:21 cidtsttele02 telegraf[4857]: 2023-09-15T12:33:21Z E! [inputs.vsphere] [vSAN] Error querying performance data for WLC-120: clom-disk: ServerFaultCode: A specified parameter was not correct: entityRefId.
Sep 15 12:33:21 cidtsttele02 telegraf[4857]: 2023-09-15T12:33:21Z E! [inputs.vsphere] [vSAN] Failed to parse a timestamp: 0001-01-01 00:00:00 +0000 UTC. Skipping
Sep 15 12:33:22 cidtsttele02 telegraf[4857]: 2023-09-15T12:33:22Z E! [inputs.vsphere] [vSAN] Error querying performance data for WLC-120: computeCluster-remotedomclient: ServerFaultCode: A specified parameter was not correct: computeCluster-remotedomclient:.
Sep 15 12:33:23 cidtsttele02 telegraf[4857]: 2023-09-15T12:33:23Z E! [inputs.vsphere] [vSAN] Failed to parse a timestamp: 0001-01-01 00:00:00 +0000 UTC. Skipping
Sep 15 12:33:23 cidtsttele02 telegraf[4857]: 2023-09-15T12:33:23Z E! [inputs.vsphere] [vSAN] Failed to parse a timestamp: 0001-01-01 00:00:00 +0000 UTC. Skipping
Sep 15 12:33:23 cidtsttele02 telegraf[4857]: 2023-09-15T12:33:23Z E! [inputs.vsphere] [vSAN] Error querying performance data for WLC-120: ddh-disk: ServerFaultCode: A specified parameter was not correct: entityRefId.
Sep 15 12:33:23 cidtsttele02 telegraf[4857]: 2023-09-15T12:33:23Z E! [inputs.vsphere] [vSAN] Failed to parse a timestamp: 0001-01-01 00:00:00 +0000 UTC. Skipping
Sep 15 12:33:23 cidtsttele02 telegraf[4857]: 2023-09-15T12:33:23Z E! [inputs.vsphere] [vSAN] Failed to parse a timestamp: 0001-01-01 00:00:00 +0000 UTC. Skipping
Sep 15 12:33:23 cidtsttele02 telegraf[4857]: 2023-09-15T12:33:23Z E! [inputs.vsphere] [vSAN] Error querying performance data for WLC-120: slab-memory: ServerFaultCode: A specified parameter was not correct: entityRefId.
Sep 15 12:34:08 cidtsttele02 telegraf[4857]: 2023-09-15T12:34:08Z I! [inputs.vsphere] Stopping plugin
Sep 15 12:34:08 cidtsttele02 systemd[1]: Stopping Telegraf...
Sep 15 12:34:08 cidtsttele02 telegraf[4857]: 2023-09-15T12:34:08Z I! [agent] Hang on, flushing any cached metrics before shutdown
Sep 15 12:34:08 cidtsttele02 telegraf[4857]: 2023-09-15T12:34:08Z I! [agent] Stopping running outputs
Sep 15 12:34:08 cidtsttele02 systemd[1]: telegraf.service: Deactivated successfully.
Sep 15 12:34:08 cidtsttele02 systemd[1]: Stopped Telegraf.
Sep 15 12:34:08 cidtsttele02 systemd[1]: telegraf.service: Consumed 13min 26.291s CPU time.
Sep 15 12:34:08 cidtsttele02 systemd[1]: Starting Telegraf...
Sep 15 12:34:08 cidtsttele02 telegraf[13476]: 2023-09-15T12:34:08Z I! Loading config: /etc/telegraf/telegraf.conf
Sep 15 12:34:08 cidtsttele02 telegraf[13476]: 2023-09-15T12:34:08Z W! DeprecationWarning: Option "force_discover_on_init" of plugin "inputs.vsphere" deprecated since version 1.14.0 and will be removed in 2.0.0: option is ignored
Sep 15 12:34:08 cidtsttele02 telegraf[13476]: 2023-09-15T12:34:08Z E! Unable to open /var/log/telegraf/telegraf.log (open /var/log/telegraf/telegraf.log: permission denied), using stderr
Sep 15 12:34:08 cidtsttele02 telegraf[13476]: 2023-09-15T12:34:08Z I! Starting Telegraf 1.28.1 brought to you by InfluxData the makers of InfluxDB
Sep 15 12:34:08 cidtsttele02 telegraf[13476]: 2023-09-15T12:34:08Z I! Available plugins: 240 inputs, 9 aggregators, 29 processors, 24 parsers, 59 outputs, 5 secret-stores
Sep 15 12:34:08 cidtsttele02 telegraf[13476]: 2023-09-15T12:34:08Z I! Loaded inputs: cpu vsphere
Sep 15 12:34:08 cidtsttele02 telegraf[13476]: 2023-09-15T12:34:08Z I! Loaded aggregators:
Sep 15 12:34:08 cidtsttele02 telegraf[13476]: 2023-09-15T12:34:08Z I! Loaded processors:
Sep 15 12:34:08 cidtsttele02 telegraf[13476]: 2023-09-15T12:34:08Z I! Loaded secretstores:
Sep 15 12:34:08 cidtsttele02 telegraf[13476]: 2023-09-15T12:34:08Z I! Loaded outputs: influxdb
Sep 15 12:34:08 cidtsttele02 telegraf[13476]: 2023-09-15T12:34:08Z I! Tags enabled: host=cidtsttele02
Sep 15 12:34:08 cidtsttele02 telegraf[13476]: 2023-09-15T12:34:08Z W! Deprecated inputs: 0 and 1 options
Sep 15 12:34:08 cidtsttele02 telegraf[13476]: 2023-09-15T12:34:08Z I! [agent] Config: Interval:1m0s, Quiet:false, Hostname:"cidtsttele02", Flush Interval:10s
Sep 15 12:34:08 cidtsttele02 systemd[1]: Started Telegraf.
Sep 15 12:34:08 cidtsttele02 telegraf[13476]: 2023-09-15T12:34:08Z I! [inputs.vsphere] Starting plugin
Sep 15 12:34:09 cidtsttele02 telegraf[13476]: 2023-09-15T12:34:09Z E! [inputs.vsphere] [vSAN] Error querying performance data for WLC-120: cluster-pmem: ServerFaultCode: The operation is not supported on the object..
Sep 15 12:34:09 cidtsttele02 telegraf[13476]: 2023-09-15T12:34:09Z E! [inputs.vsphere] [vSAN] Error querying performance data for WLC-120: ddh-disk: ServerFaultCode: A specified parameter was not correct: entityRefId.
Sep 15 12:34:11 cidtsttele02 telegraf[13476]: 2023-09-15T12:34:11Z E! [inputs.vsphere] [vSAN] Failed to parse a timestamp: 0001-01-01 00:00:00 +0000 UTC. Skipping
Sep 15 12:34:11 cidtsttele02 telegraf[13476]: 2023-09-15T12:34:11Z E! [inputs.vsphere] [vSAN] Failed to parse a timestamp: 0001-01-01 00:00:00 +0000 UTC. Skipping
Sep 15 12:34:12 cidtsttele02 telegraf[13476]: 2023-09-15T12:34:12Z E! [inputs.vsphere] [vSAN] Failed to parse a timestamp: 0001-01-01 00:00:00 +0000 UTC. Skipping
Sep 15 12:34:12 cidtsttele02 telegraf[13476]: 2023-09-15T12:34:12Z E! [inputs.vsphere] [vSAN] Error querying performance data for WLC-120: computeCluster-remotedomclient: ServerFaultCode: A specified parameter was not correct: computeCluster-remotedomclient:
.
Sep 15 12:34:12 cidtsttele02 telegraf[13476]: 2023-09-15T12:34:12Z E! [inputs.vsphere] [vSAN] Failed to parse a timestamp: 0001-01-01 00:00:00 +0000 UTC. Skipping
Sep 15 12:34:12 cidtsttele02 telegraf[13476]: 2023-09-15T12:34:12Z E! [inputs.vsphere] [vSAN] Failed to parse a timestamp: 0001-01-01 00:00:00 +0000 UTC. Skipping
Sep 15 12:34:13 cidtsttele02 telegraf[13476]: 2023-09-15T12:34:13Z E! [inputs.vsphere] [vSAN] Failed to parse a timestamp: 0001-01-01 00:00:00 +0000 UTC. Skipping
Sep 15 12:34:15 cidtsttele02 telegraf[13476]: 2023-09-15T12:34:15Z E! [inputs.vsphere] [vSAN] Failed to parse a timestamp: 0001-01-01 00:00:00 +0000 UTC. Skipping
Sep 15 12:34:15 cidtsttele02 telegraf[13476]: 2023-09-15T12:34:15Z E! [inputs.vsphere] [vSAN] Error querying performance data for WLC-120: host-pmem: ServerFaultCode: The operation is not supported on the object..
Sep 15 12:34:15 cidtsttele02 telegraf[13476]: 2023-09-15T12:34:15Z E! [inputs.vsphere] [vSAN] Failed to parse a timestamp: 0001-01-01 00:00:00 +0000 UTC. Skipping
Sep 15 12:34:16 cidtsttele02 telegraf[13476]: 2023-09-15T12:34:16Z E! [inputs.vsphere] [vSAN] Error querying performance data for WLC-120: cmmds-workload: ServerFaultCode: A specified parameter was not correct: entityRefId.
Sep 15 12:34:16 cidtsttele02 telegraf[13476]: 2023-09-15T12:34:16Z E! [inputs.vsphere] [vSAN] Failed to parse a timestamp: 0001-01-01 00:00:00 +0000 UTC. Skipping
Sep 15 12:34:16 cidtsttele02 telegraf[13476]: 2023-09-15T12:34:16Z E! [inputs.vsphere] [vSAN] Failed to parse a timestamp: 0001-01-01 00:00:00 +0000 UTC. Skipping
Sep 15 12:34:17 cidtsttele02 telegraf[13476]: 2023-09-15T12:34:17Z E! [inputs.vsphere] [vSAN] Error querying performance data for WLC-120: clom-host: ServerFaultCode: A specified parameter was not correct: entityRefId.
Sep 15 12:34:17 cidtsttele02 telegraf[13476]: 2023-09-15T12:34:17Z E! [inputs.vsphere] [vSAN] Error querying performance data for WLC-120: clom-disk: ServerFaultCode: A specified parameter was not correct: entityRefId.
Sep 15 12:34:18 cidtsttele02 telegraf[13476]: 2023-09-15T12:34:18Z E! [inputs.vsphere] [vSAN] Failed to parse a timestamp: 0001-01-01 00:00:00 +0000 UTC. Skipping
Sep 15 12:34:18 cidtsttele02 telegraf[13476]: 2023-09-15T12:34:18Z E! [inputs.vsphere] [vSAN] Error querying performance data for WLC-120: heap-memory: ServerFaultCode: A specified parameter was not correct: entityRefId.
Sep 15 12:34:18 cidtsttele02 telegraf[13476]: 2023-09-15T12:34:18Z E! [inputs.vsphere] [vSAN] Failed to parse a timestamp: 0001-01-01 00:00:00 +0000 UTC. Skipping
Sep 15 12:34:18 cidtsttele02 telegraf[13476]: 2023-09-15T12:34:18Z E! [inputs.vsphere] [vSAN] Failed to parse a timestamp: 0001-01-01 00:00:00 +0000 UTC. Skipping
Sep 15 12:34:18 cidtsttele02 telegraf[13476]: 2023-09-15T12:34:18Z E! [inputs.vsphere] [vSAN] Failed to parse a timestamp: 0001-01-01 00:00:00 +0000 UTC. Skipping
Sep 15 12:34:18 cidtsttele02 telegraf[13476]: 2023-09-15T12:34:18Z E! [inputs.vsphere] [vSAN] Error querying performance data for WLC-120: slab-memory: ServerFaultCode: A specified parameter was not correct: entityRefId.
Sep 15 12:34:18 cidtsttele02 telegraf[13476]: 2023-09-15T12:34:18Z E! [inputs.vsphere] [vSAN] Failed to parse a timestamp: 0001-01-01 00:00:00 +0000 UTC. Skipping

# Read metrics form VMware vCenter

[[inputs.vsphere]]
interval = "120s"
vcenters = [ "https
username = "
password = "
timeout = "110s"

insecure_skip_verify = true
#force_discover_on_init = true

vm_metric_exclude = [""] # Exclude realtime metrics
host_metric_exclude = ["
"] # Exclude realtime metrics
datastore_metric_exclude = [""]
datacenter_metric_exclude = ["
"]
cluster_metric_exclude = [""]
resource_pool_metric_exclude = ["
"]

vsan_metric_include = [ "performance.*" ]

#vsan_metric_skip_verify = true
vsan_metric_exclude = [ ]

vsan_cluster_include = ["/*/host/WLC-120" ]
vsan_interval = "30s"

#max_query_metrics = 256
discover_concurrency = 5
collect_concurrency = 5

historical_interval = "60s"

@Muy69
Copy link
Author

Muy69 commented Sep 15, 2023

Could it be that there is another Entry with 300 hard coded Timerange ?

vsphere.go ( Line 184 )

func init() {
inputs.Add("vsphere", func() telegraf.Input {
return &VSphere{
DatacenterInclude: []string{"/"},
ClusterInclude: []string{"/
/host/"},
HostInstances: true,
HostInclude: []string{"/*/host/
"},
ResourcePoolInclude: []string{"//host/**"},
VMInstances: true,
VMInclude: []string{"/
/vm/"},
DatastoreInclude: []string{"/*/datastore/
"},
VSANMetricExclude: []string{""},
VSANClusterInclude: []string{"/
/host/**"},
Separator: "_",
CustomAttributeExclude: []string{"*"},
UseIntSamples: true,
MaxQueryObjects: 256,
MaxQueryMetrics: 256,
CollectConcurrency: 1,
DiscoverConcurrency: 1,
MetricLookback: 3,
ForceDiscoverOnInit: true,
ObjectDiscoveryInterval: config.Duration(time.Second * 300),
Timeout: config.Duration(time.Second * 60),
HistoricalInterval: config.Duration(time.Second * 300),
VSANInterval: config.Duration(time.Second * 300),
DisconnectedServersBehavior: "error",
HTTPProxy: proxy.HTTPProxy{UseSystemProxy: true},

@Muy69
Copy link
Author

Muy69 commented Sep 19, 2023

Is that info sufficient ? @powersj

The current govmomi supports now vsphere up to 8u1c .
vmware/govmomi#3193

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Requests for new plugin and for new features to existing plugins
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants