Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[inputs.vsphere] Resolved issue 4790 (Resource whitelisting) #5165

Merged
merged 39 commits into from
Feb 12, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
3408955
Initial implementation of Finder
prydin Nov 10, 2018
3468e5e
Refactored to load all properties in one pass
prydin Nov 10, 2018
2b343f2
Fully implemented but not completely tested
prydin Nov 12, 2018
47cca7c
PR candidate 1
prydin Nov 13, 2018
01e4ac9
Removed excessive logging
prydin Nov 13, 2018
3124ad6
Added comments
prydin Nov 14, 2018
db694e6
Scale and performance improvements
prydin Nov 16, 2018
5fdabdb
Use timestamp of latest sample as start point for next round
prydin Nov 18, 2018
6c4cba0
* Improved collection concurrency (one goroutine per object type)
prydin Nov 27, 2018
26e1536
Added hard 100000 metric query limit
prydin Nov 28, 2018
aaa6754
Moved timeout logic to client.go
prydin Nov 30, 2018
e9956ca
Removed WorkerPool and added ThrottledExecutor instead
prydin Dec 4, 2018
94c6fb6
Changed cluster_instances default value to false, since true causes p…
prydin Dec 5, 2018
646c596
Fixed broken test cases
prydin Dec 5, 2018
2de5e4b
Merge remote-tracking branch 'upstream/master' into prydin-scale-impr…
prydin Dec 6, 2018
9ab5b94
Reverted accidental change to wavefront.go
prydin Dec 6, 2018
466b139
Added check for value indices
prydin Dec 7, 2018
bd3fe0d
More robust panic handling
prydin Dec 10, 2018
3ede8cc
Added panic_handler.go
prydin Dec 10, 2018
957762d
Reverted to govmomi 0.18.0
prydin Dec 10, 2018
f563cd8
Exclude tests requiring VPX simulator on 32-bit arch
prydin Dec 11, 2018
5522836
Merged changes from prydin-scale-improvement
prydin Dec 11, 2018
f71b466
Finalized merge from prydin-scalability
prydin Dec 13, 2018
60b4f17
Changed handling of late samples
prydin Dec 19, 2018
3e8c058
Align all timestamps to interval boundary
prydin Dec 19, 2018
6547068
Added documentation for inventory paths
prydin Dec 19, 2018
4442920
Changed logtags from [input.vsphere] to [inputs.vsphere]
prydin Dec 19, 2018
1bb9eae
Fixed broken test case
prydin Dec 19, 2018
dfdd0ee
Fixed 32-bit test issue (bug in vSphere simulator)
prydin Dec 19, 2018
180a7bf
Added cancel-handler to ThrottledExecutor, removed unnecessary warnin…
prydin Dec 21, 2018
0edee16
Merged from prydin-scale-improvement
prydin Dec 21, 2018
a0ca647
Fixed test case issues
prydin Dec 21, 2018
1eb24a3
Back-ported timestamping fixes from pontus-issue-4790
prydin Dec 22, 2018
6819c3a
Fixed test issues
prydin Dec 28, 2018
bc185cd
Merged from upstream
prydin Feb 5, 2019
cbf93d2
Fixed some post-merge issues
prydin Feb 12, 2019
72c3d89
Added Wavefront SDK back to Gopkg.lock
prydin Feb 12, 2019
ef83845
Removed trailing spaces
prydin Feb 12, 2019
dc7054f
Removed trailing spaces
prydin Feb 12, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
47 changes: 47 additions & 0 deletions plugins/inputs/vsphere/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ vm_metric_exclude = [ "*" ]

## VMs
## Typical VM metrics (if omitted or empty, all metrics are collected)
# vm_include = [ "/*/vm/**"] # Inventory path to VMs to collect (by default all are collected)
prydin marked this conversation as resolved.
Show resolved Hide resolved
vm_metric_include = [
"cpu.demand.average",
"cpu.idle.summation",
Expand Down Expand Up @@ -68,6 +69,7 @@ vm_metric_exclude = [ "*" ]

## Hosts
## Typical host metrics (if omitted or empty, all metrics are collected)
# host_include = [ "/*/host/**"] # Inventory path to hosts to collect (by default all are collected)
host_metric_include = [
"cpu.coreUtilization.average",
"cpu.costop.summation",
Expand Down Expand Up @@ -120,16 +122,19 @@ vm_metric_exclude = [ "*" ]
# host_instances = true ## true by default

## Clusters
# cluster_include = [ "/*/host/**"] # Inventory path to clusters to collect (by default all are collected)
# cluster_metric_include = [] ## if omitted or empty, all metrics are collected
# cluster_metric_exclude = [] ## Nothing excluded by default
# cluster_instances = false ## false by default

## Datastores
# cluster_include = [ "/*/datastore/**"] # Inventory path to datastores to collect (by default all are collected)
# datastore_metric_include = [] ## if omitted or empty, all metrics are collected
# datastore_metric_exclude = [] ## Nothing excluded by default
# datastore_instances = false ## false by default

## Datacenters
# datacenter_include = [ "/*/host/**"] # Inventory path to clusters to collect (by default all are collected)
datacenter_metric_include = [] ## if omitted or empty, all metrics are collected
datacenter_metric_exclude = [ "*" ] ## Datacenters are not collected by default.
# datacenter_instances = false ## false by default
Expand Down Expand Up @@ -196,6 +201,48 @@ For setting up concurrency, modify `collect_concurrency` and `discover_concurren
# discover_concurrency = 1
```

### Inventory Paths
Resources to be monitored can be selected using Inventory Paths. This treats the vSphere inventory as a tree structure similar
to a file system. A vSphere inventory has a structure similar to this:

```
<root>
+-DC0 # Virtual datacenter
+-datastore # Datastore folder (created by system)
| +-Datastore1
+-host # Host folder (created by system)
| +-Cluster1
| | +-Host1
| | | +-VM1
| | | +-VM2
| | | +-hadoop1
| +-Host2 # Dummy cluster created for non-clustered host
| | +-Host2
| | | +-VM3
| | | +-VM4
+-vm # VM folder (created by system)
| +-VM1
| +-VM2
| +-Folder1
| | +-hadoop1
| | +-NestedFolder1
| | | +-VM3
| | | +-VM4
```

#### Using Inventory Paths
Using familiar UNIX-style paths, one could select e.g. VM2 with the path ```/DC0/vm/VM2```.

Often, we want to select a group of resource, such as all the VMs in a folder. We could use the path ```/DC0/vm/Folder1/*``` for that.

Another possibility is to select objects using a partial name, such as ```/DC0/vm/Folder1/hadoop*``` yielding all vms in Folder1 with a name starting with "hadoop".

Finally, due to the arbitrary nesting of the folder structure, we need a "recursive wildcard" for traversing multiple folders. We use the "**" symbol for that. If we want to look for a VM with a name starting with "hadoop" in any folder, we could use the following path: ```/DC0/vm/**/hadoop*```

#### Multiple paths to VMs
As we can see from the example tree above, VMs appear both in its on folder under the datacenter, as well as under the hosts. This is useful when you like to select VMs on a specific host. For example, ```/DC0/host/Cluster1/Host1/hadoop*``` selects all VMs with a name starting with "hadoop" that are running on Host1.

We can extend this to looking at a cluster level: ```/DC0/host/Cluster1/*/hadoop*```. This selects any VM matching "hadoop*" on any host in Cluster1.
## Performance Considerations

### Realtime vs. historical metrics
Expand Down
18 changes: 9 additions & 9 deletions plugins/inputs/vsphere/client.go
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,7 @@ func (cf *ClientFactory) GetClient(ctx context.Context) (*Client, error) {
ctx1, cancel1 := context.WithTimeout(ctx, cf.parent.Timeout.Duration)
defer cancel1()
if _, err := methods.GetCurrentTime(ctx1, cf.client.Client); err != nil {
log.Printf("I! [input.vsphere]: Client session seems to have time out. Reauthenticating!")
log.Printf("I! [inputs.vsphere]: Client session seems to have time out. Reauthenticating!")
ctx2, cancel2 := context.WithTimeout(ctx, cf.parent.Timeout.Duration)
defer cancel2()
if cf.client.Client.SessionManager.Login(ctx2, url.UserPassword(cf.parent.Username, cf.parent.Password)) != nil {
Expand Down Expand Up @@ -102,7 +102,7 @@ func NewClient(ctx context.Context, u *url.URL, vs *VSphere) (*Client, error) {
u.User = url.UserPassword(vs.Username, vs.Password)
}

log.Printf("D! [input.vsphere]: Creating client: %s", u.Host)
log.Printf("D! [inputs.vsphere]: Creating client: %s", u.Host)
soapClient := soap.NewClient(u, tlsCfg.InsecureSkipVerify)

// Add certificate if we have it. Use it to log us in.
Expand Down Expand Up @@ -173,9 +173,9 @@ func NewClient(ctx context.Context, u *url.URL, vs *VSphere) (*Client, error) {
if err != nil {
return nil, err
}
log.Printf("D! [input.vsphere] vCenter says max_query_metrics should be %d", n)
log.Printf("D! [inputs.vsphere] vCenter says max_query_metrics should be %d", n)
if n < vs.MaxQueryMetrics {
log.Printf("W! [input.vsphere] Configured max_query_metrics is %d, but server limits it to %d. Reducing.", vs.MaxQueryMetrics, n)
log.Printf("W! [inputs.vsphere] Configured max_query_metrics is %d, but server limits it to %d. Reducing.", vs.MaxQueryMetrics, n)
vs.MaxQueryMetrics = n
}
return client, nil
Expand All @@ -199,7 +199,7 @@ func (c *Client) close() {
defer cancel()
if c.Client != nil {
if err := c.Client.Logout(ctx); err != nil {
log.Printf("E! [input.vsphere]: Error during logout: %s", err)
log.Printf("E! [inputs.vsphere]: Error during logout: %s", err)
}
}
})
Expand Down Expand Up @@ -228,7 +228,7 @@ func (c *Client) GetMaxQueryMetrics(ctx context.Context) (int, error) {
if s, ok := res[0].GetOptionValue().Value.(string); ok {
v, err := strconv.Atoi(s)
if err == nil {
log.Printf("D! [input.vsphere] vCenter maxQueryMetrics is defined: %d", v)
log.Printf("D! [inputs.vsphere] vCenter maxQueryMetrics is defined: %d", v)
if v == -1 {
// Whatever the server says, we never ask for more metrics than this.
return absoluteMaxMetrics, nil
Expand All @@ -239,17 +239,17 @@ func (c *Client) GetMaxQueryMetrics(ctx context.Context) (int, error) {
// Fall through version-based inference if value isn't usable
}
} else {
log.Println("D! [input.vsphere] Option query for maxQueryMetrics failed. Using default")
log.Println("D! [inputs.vsphere] Option query for maxQueryMetrics failed. Using default")
}

// No usable maxQueryMetrics setting. Infer based on version
ver := c.Client.Client.ServiceContent.About.Version
parts := strings.Split(ver, ".")
if len(parts) < 2 {
log.Printf("W! [input.vsphere] vCenter returned an invalid version string: %s. Using default query size=64", ver)
log.Printf("W! [inputs.vsphere] vCenter returned an invalid version string: %s. Using default query size=64", ver)
return 64, nil
}
log.Printf("D! [input.vsphere] vCenter version is: %s", ver)
log.Printf("D! [inputs.vsphere] vCenter version is: %s", ver)
major, err := strconv.Atoi(parts[0])
if err != nil {
return 0, err
Expand Down
Loading