Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove hardcoded 300s and add error messages when data collection is missing #10142

Merged
merged 53 commits into from
Jan 24, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
53 commits
Select commit Hold shift + click to select a range
aed6cc6
Never default to a qualifier when none of them are set. (#9148)
ph Nov 19, 2018
07c4fa5
Merge branch 'master' of github.com:kaiyan-sheng/beats
kaiyan-sheng Nov 27, 2018
ff52365
Merge branch 'master' of github.com:kaiyan-sheng/beats
kaiyan-sheng Dec 10, 2018
5f9f575
Merge remote-tracking branch 'upstream/master'
kaiyan-sheng Dec 10, 2018
09d8b97
Merge remote-tracking branch 'upstream/master'
kaiyan-sheng Dec 10, 2018
141277e
Merge remote-tracking branch 'upstream/master'
kaiyan-sheng Dec 13, 2018
276a1de
Merge remote-tracking branch 'upstream/master'
kaiyan-sheng Dec 14, 2018
19a705a
Merge remote-tracking branch 'upstream/master'
kaiyan-sheng Dec 17, 2018
e54a566
Merge remote-tracking branch 'upstream/master'
kaiyan-sheng Dec 18, 2018
edd4488
Merge remote-tracking branch 'upstream/master'
kaiyan-sheng Dec 26, 2018
b0d2e22
Merge remote-tracking branch 'upstream/master'
kaiyan-sheng Dec 27, 2018
378b4f9
Merge remote-tracking branch 'upstream/master'
kaiyan-sheng Dec 28, 2018
1996739
Merge remote-tracking branch 'upstream/master'
kaiyan-sheng Dec 28, 2018
1e05766
Merge remote-tracking branch 'upstream/master'
kaiyan-sheng Jan 2, 2019
33c130c
Merge remote-tracking branch 'upstream/master'
kaiyan-sheng Jan 2, 2019
513bbd0
Merge branch 'master' of github.com:elastic/beats
kaiyan-sheng Jan 3, 2019
c618fd0
Merge branch 'master' of github.com:elastic/beats
kaiyan-sheng Jan 3, 2019
40c8368
Merge remote-tracking branch 'upstream/master'
kaiyan-sheng Jan 3, 2019
042c89b
Merge remote-tracking branch 'upstream/master'
kaiyan-sheng Jan 3, 2019
97393f4
Merge remote-tracking branch 'upstream/master'
kaiyan-sheng Jan 4, 2019
5b300c8
Merge branch 'master' of github.com:elastic/beats
kaiyan-sheng Jan 4, 2019
be995ca
Merge remote-tracking branch 'upstream/master'
kaiyan-sheng Jan 7, 2019
b9461c2
Merge remote-tracking branch 'upstream/master'
kaiyan-sheng Jan 9, 2019
89f5158
Merge remote-tracking branch 'upstream/master'
kaiyan-sheng Jan 10, 2019
2db435e
Merge remote-tracking branch 'upstream/master'
kaiyan-sheng Jan 11, 2019
1f3911b
Merge remote-tracking branch 'upstream/master'
kaiyan-sheng Jan 14, 2019
3244062
Merge remote-tracking branch 'upstream/master'
kaiyan-sheng Jan 14, 2019
89fcbbf
Merge remote-tracking branch 'upstream/master'
kaiyan-sheng Jan 14, 2019
71b5c64
Merge remote-tracking branch 'upstream/master'
kaiyan-sheng Jan 14, 2019
19cda55
Merge remote-tracking branch 'upstream/master'
kaiyan-sheng Jan 14, 2019
c08f7fc
Merge remote-tracking branch 'upstream/master'
kaiyan-sheng Jan 15, 2019
dcd95f8
Merge remote-tracking branch 'upstream/master'
kaiyan-sheng Jan 16, 2019
f6fa0c2
Merge remote-tracking branch 'upstream/master'
kaiyan-sheng Jan 16, 2019
af20179
Merge remote-tracking branch 'upstream/master'
kaiyan-sheng Jan 18, 2019
bf32666
Merge remote-tracking branch 'upstream/master'
kaiyan-sheng Jan 18, 2019
9386146
Merge remote-tracking branch 'upstream/master'
kaiyan-sheng Jan 21, 2019
b7ff595
Merge remote-tracking branch 'upstream/master'
kaiyan-sheng Jan 22, 2019
fbbf49d
Merge remote-tracking branch 'upstream/master'
kaiyan-sheng Jan 22, 2019
d5e086d
Merge remote-tracking branch 'upstream/master'
kaiyan-sheng Jan 23, 2019
9aed507
Merge remote-tracking branch 'upstream/master'
kaiyan-sheng Jan 24, 2019
2419bbf
Merge remote-tracking branch 'upstream/master'
kaiyan-sheng Jan 24, 2019
1e7e5f4
Merge remote-tracking branch 'upstream/master'
kaiyan-sheng Jan 24, 2019
187fef0
Check instance state and instance launch time
kaiyan-sheng Jan 17, 2019
2a4c226
Remove hardcoded 300s and add error messages
kaiyan-sheng Jan 18, 2019
7ae4fbd
Add logger instance in metricset instead recreating new ones
kaiyan-sheng Jan 18, 2019
04a35a1
Remove punctuation at the end of error string
kaiyan-sheng Jan 18, 2019
68d2282
Fix unit test
kaiyan-sheng Jan 18, 2019
f507fba
Rename ec2Logger to logger
kaiyan-sheng Jan 18, 2019
cef59de
Add check on period to be multiple of 60s
kaiyan-sheng Jan 21, 2019
0fbd8fe
Change to info msg for period setting
kaiyan-sheng Jan 23, 2019
eef2359
Improve info/error messages
kaiyan-sheng Jan 23, 2019
cff6bd8
Remove else statement
kaiyan-sheng Jan 23, 2019
54c589a
Update metricbeat/docs/modules/aws.asciidoc
kaiyan-sheng Jan 24, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 13 additions & 8 deletions metricbeat/docs/modules/aws.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,9 @@ This file is generated! See scripts/docs_collector.py

beta[]

This module periodically fetches monitoring metrics from AWS Cloudwatch using GetMetricData API. Note: extra AWS charges on
GetMetricData API requests will be generated by this module.
This module periodically fetches monitoring metrics from AWS Cloudwatch using
https://docs.aws.amazon.com/AmazonCloudWatch/latest/APIReference/API_GetMetricData.html[GetMetricData API] for running
EC2 instances. Note: extra AWS charges on GetMetricData API requests will be generated by this module.

The default metricset is `ec2`.

Expand All @@ -30,7 +31,7 @@ see https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_temp.html[Te
`sts get-session-token` AWS CLI can be used to generate temporary credentials. For example. with MFA-enabled:

----
aws> sts get-session-token --serial-number arn:aws:iam::1234:mfa/test@elastic.co --token-code 456789 --duration-seconds 129600
aws> sts get-session-token --serial-number arn:aws:iam::1234:mfa/your-email@example.com --token-code 456789 --duration-seconds 129600
----

Specific permissions needs to be added into the IAM user's policy to allow Metricbeat collecting AWS monitoring metrics. Please
Expand All @@ -43,11 +44,12 @@ $0.01/1000 metrics requested using GetMetricData. Please see https://aws.amazon.
for more details. To avoid unnecessary charges, `period` is preferred to be set to `300s` or multiples of `300s`, such as
`600s` and `900s`.

For more granular monitoring data you can enable detailed monitoring on the instance to get metrics every 1 minute. With this
detailed monitoring enabled, `period` in aws module configuration can be any number larger than `60s`. Since AWS sends metric
data to CloudWatch in 1-minute periods, setting metricbeat module `period` less than `60s` will cause extra API requests which
means extra charges on AWS. To avoid unnecessary charges, `period` is preferred to be set to `60s` or multiples of `60s`,
such as `120s` and `180s`.
For more granular monitoring data you can enable detailed monitoring on the instance to get metrics every 1 minute. Please see
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-cloudwatch-new.html[Enabling Detailed Monitoring] for instructions
on how to enable detailed monitoring. With detailed monitoring enabled, `period` in aws module configuration can be any number
larger than `60s`. Since AWS sends metric data to CloudWatch in 1-minute periods, setting metricbeat module `period` less
than `60s` will cause extra API requests which means extra charges on AWS. To avoid unnecessary charges, `period` is
preferred to be set to `60s` or multiples of `60s`, such as `120s` and `180s`.

Here is an example of aws metricbeat module configuration:

Expand All @@ -63,6 +65,9 @@ metricbeat.modules:
default_region: '${AWS_REGION:us-west-1}'
----

This module only collects metrics for EC2 instances that are in `running` state and exist more than 10 minutes to make sure
there are monitoring metrics exist in Cloudwatch already.

The AWS module comes with a predefined dashboard. For example:

image::./images/metricbeat-aws-ec2-overview.png[]
Expand Down
21 changes: 13 additions & 8 deletions x-pack/metricbeat/module/aws/_meta/docs.asciidoc
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
This module periodically fetches monitoring metrics from AWS Cloudwatch using GetMetricData API. Note: extra AWS charges on
GetMetricData API requests will be generated by this module.
This module periodically fetches monitoring metrics from AWS Cloudwatch using
https://docs.aws.amazon.com/AmazonCloudWatch/latest/APIReference/API_GetMetricData.html[GetMetricData API] for running
EC2 instances. Note: extra AWS charges on GetMetricData API requests will be generated by this module.

The default metricset is `ec2`.

Expand All @@ -21,7 +22,7 @@ see https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_temp.html[Te
`sts get-session-token` AWS CLI can be used to generate temporary credentials. For example. with MFA-enabled:

----
aws> sts get-session-token --serial-number arn:aws:iam::1234:mfa/test@elastic.co --token-code 456789 --duration-seconds 129600
aws> sts get-session-token --serial-number arn:aws:iam::1234:mfa/your-email@example.com --token-code 456789 --duration-seconds 129600
----

Specific permissions needs to be added into the IAM user's policy to allow Metricbeat collecting AWS monitoring metrics. Please
Expand All @@ -34,11 +35,12 @@ $0.01/1000 metrics requested using GetMetricData. Please see https://aws.amazon.
for more details. To avoid unnecessary charges, `period` is preferred to be set to `300s` or multiples of `300s`, such as
`600s` and `900s`.

For more granular monitoring data you can enable detailed monitoring on the instance to get metrics every 1 minute. With this
detailed monitoring enabled, `period` in aws module configuration can be any number larger than `60s`. Since AWS sends metric
data to CloudWatch in 1-minute periods, setting metricbeat module `period` less than `60s` will cause extra API requests which
means extra charges on AWS. To avoid unnecessary charges, `period` is preferred to be set to `60s` or multiples of `60s`,
such as `120s` and `180s`.
For more granular monitoring data you can enable detailed monitoring on the instance to get metrics every 1 minute. Please see
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-cloudwatch-new.html[Enabling Detailed Monitoring] for instructions
on how to enable detailed monitoring. With detailed monitoring enabled, `period` in aws module configuration can be any number
larger than `60s`. Since AWS sends metric data to CloudWatch in 1-minute periods, setting metricbeat module `period` less
than `60s` will cause extra API requests which means extra charges on AWS. To avoid unnecessary charges, `period` is
preferred to be set to `60s` or multiples of `60s`, such as `120s` and `180s`.

Here is an example of aws metricbeat module configuration:

Expand All @@ -54,6 +56,9 @@ metricbeat.modules:
default_region: '${AWS_REGION:us-west-1}'
----

This module only collects metrics for EC2 instances that are in `running` state and exist more than 10 minutes to make sure
there are monitoring metrics exist in Cloudwatch already.

The AWS module comes with a predefined dashboard. For example:

image::./images/metricbeat-aws-ec2-overview.png[]
148 changes: 79 additions & 69 deletions x-pack/metricbeat/module/aws/ec2/ec2.go
Original file line number Diff line number Diff line change
Expand Up @@ -42,9 +42,12 @@ func init() {
// interface methods except for Fetch.
type MetricSet struct {
*aws.MetricSet
moduleConfig *aws.Config
awsConfig *awssdk.Config
regionsList []string
moduleConfig *aws.Config
awsConfig *awssdk.Config
regionsList []string
durationString string
periodInSec int
logger *logp.Logger
}

// metricIDNameMap is a translating map between createMetricDataQuery id
Expand Down Expand Up @@ -72,6 +75,7 @@ var metricIDNameMap = map[string][]string{
// any MetricSet specific configuration options if there are any.
func New(base mb.BaseMetricSet) (mb.MetricSet, error) {
cfgwarn.Beta("The aws ec2 metricset is beta.")
ec2Logger := logp.NewLogger(aws.ModuleName)

moduleConfig := aws.Config{}
if err := base.Module().UnpackConfig(&moduleConfig); err != nil {
Expand All @@ -80,7 +84,7 @@ func New(base mb.BaseMetricSet) (mb.MetricSet, error) {

if moduleConfig.Period == "" {
err := errors.New("period is not set in AWS module config")
logp.Error(err)
ec2Logger.Error(err)
}

metricSet, err := aws.NewMetricSet(base)
Expand All @@ -103,19 +107,38 @@ func New(base mb.BaseMetricSet) (mb.MetricSet, error) {
}

awsConfig.Region = moduleConfig.DefaultRegion

svcEC2 := ec2.New(awsConfig)
regionsList, err := getRegions(svcEC2)
if err != nil {
err = errors.Wrap(err, "getRegions failed")
logp.Error(err)
ec2Logger.Error(err.Error())
}

// Calculate duration based on period
durationString, periodSec, err := convertPeriodToDuration(moduleConfig.Period)
if err != nil {
ec2Logger.Error(err.Error())
return nil, err
}

// Check if period is set to be multiple of 60s or 300s
remainder300 := periodSec % 300
remainder60 := periodSec % 60
if remainder300 != 0 || remainder60 != 0 {
err := errors.New("period needs to be set to 60s (or a multiple of 60s) if detailed monitoring is " +
"enabled for EC2 instances or set to 300s (or a multiple of 300s) if EC2 instances has basic monitoring. " +
"To avoid data missing or extra costs, please make sure period is set correctly in config.yml")
ec2Logger.Info(err)
}

return &MetricSet{
MetricSet: metricSet,
moduleConfig: &moduleConfig,
awsConfig: &awsConfig,
regionsList: regionsList,
MetricSet: metricSet,
moduleConfig: &moduleConfig,
awsConfig: &awsConfig,
regionsList: regionsList,
durationString: durationString,
periodInSec: periodSec,
logger: ec2Logger,
}, nil
}

Expand All @@ -129,32 +152,37 @@ func (m *MetricSet) Fetch(report mb.ReporterV2) {
instanceIDs, instancesOutputs, err := getInstancesPerRegion(svcEC2)
if err != nil {
err = errors.Wrap(err, "getInstancesPerRegion failed, skipping region "+regionName)
logp.Error(err)
m.logger.Errorf(err.Error())
report.Error(err)
continue
}

svcCloudwatch := cloudwatch.New(*m.awsConfig)
for _, instanceID := range instanceIDs {
//Calculate duration based on period
detailedMonitoring := instancesOutputs[instanceID].Monitoring.State
durationString, periodSec := convertPeriodToDuration(m.moduleConfig.Period, detailedMonitoring)
init := true
getMetricDataOutput := &cloudwatch.GetMetricDataOutput{NextToken: nil}
for init || getMetricDataOutput.NextToken != nil {
init = false
output, err := getMetricDataPerRegion(durationString, periodSec, instanceID, getMetricDataOutput.NextToken, svcCloudwatch)
output, err := getMetricDataPerRegion(m.durationString, m.periodInSec, instanceID, getMetricDataOutput.NextToken, svcCloudwatch)
if err != nil {
err = errors.Wrap(err, "getMetricDataPerRegion failed, skipping region "+regionName+" for instance "+instanceID)
logp.Error(err)
m.logger.Error(err.Error())
report.Error(err)
continue
}
getMetricDataOutput.MetricDataResults = append(getMetricDataOutput.MetricDataResults, output.MetricDataResults...)
}
event, err := createCloudWatchEvents(getMetricDataOutput, instanceID, instancesOutputs[instanceID], regionName)

event, info, err := createCloudWatchEvents(getMetricDataOutput, instanceID, instancesOutputs[instanceID], regionName)
if info != "" {
m.logger.Info(info)
}

if err != nil {
report.Error(err)
m.logger.Error(err.Error())
event.Error = err
report.Event(event)
continue
}
report.Event(event)
}
Expand All @@ -166,7 +194,7 @@ func getRegions(svc ec2iface.EC2API) (regionsList []string, err error) {
req := svc.DescribeRegionsRequest(input)
output, err := req.Send()
if err != nil {
logp.Error(errors.Wrap(err, "Failed DescribeRegions"))
err = errors.Wrap(err, "Failed DescribeRegions")
return
}
for _, region := range output.Regions {
Expand All @@ -175,29 +203,31 @@ func getRegions(svc ec2iface.EC2API) (regionsList []string, err error) {
return
}

func createCloudWatchEvents(getMetricDataOutput *cloudwatch.GetMetricDataOutput, instanceID string, instanceOutput ec2.Instance, regionName string) (event mb.Event, err error) {
machineType, err := instanceOutput.InstanceType.MarshalValue()
if err != nil {
err = errors.Wrap(err, "instance.InstanceType.MarshalValue failed")
logp.Error(err)
}

func createCloudWatchEvents(getMetricDataOutput *cloudwatch.GetMetricDataOutput, instanceID string, instanceOutput ec2.Instance, regionName string) (event mb.Event, info string, err error) {
event.Service = metricsetName
event.RootFields = common.MapStr{}
mapOfRootFieldsResults := make(map[string]interface{})
mapOfRootFieldsResults["service.name"] = metricsetName
mapOfRootFieldsResults["cloud.provider"] = metricsetName
mapOfRootFieldsResults["cloud.instance.id"] = instanceID

machineType, err := instanceOutput.InstanceType.MarshalValue()
if err != nil {
err = errors.Wrap(err, "instance.InstanceType.MarshalValue failed")
return
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To have this cleaner and more testable we createCloudWatchEvents should probably return and error and logging happens outside if needed. But let's get this PR in and we can still clean it up later.

}

mapOfRootFieldsResults["cloud.machine.type"] = machineType
mapOfRootFieldsResults["cloud.availability_zone"] = *instanceOutput.Placement.AvailabilityZone
mapOfRootFieldsResults["cloud.image.id"] = *instanceOutput.ImageId
mapOfRootFieldsResults["cloud.region"] = regionName

resultRootFields, err := eventMapping(mapOfRootFieldsResults, schemaRootFields)
if err != nil {
err = errors.Wrap(err, "Error trying to apply schema in AWS EC2 metricbeat module.")
logp.Error(err)
err = errors.Wrap(err, "Error trying to apply schema schemaRootFields in AWS EC2 metricbeat module.")
return
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment above.

}
event.RootFields = resultRootFields

mapOfMetricSetFieldResults := make(map[string]interface{})
for _, output := range getMetricDataOutput.MetricDataResults {
Expand All @@ -208,12 +238,17 @@ func createCloudWatchEvents(getMetricDataOutput *cloudwatch.GetMetricDataOutput,
mapOfMetricSetFieldResults[metricKey[0]] = fmt.Sprint(output.Values[0])
}

if len(mapOfMetricSetFieldResults) <= 3 {
info = "Missing Cloudwatch data for instance " + instanceID + ". This is expected for a new instance during the " +
"first data collection. If this shows up multiple times, please recheck the period setting in config."
return
}

resultMetricSetFields, err := eventMapping(mapOfMetricSetFieldResults, schemaMetricSetFields)
if err != nil {
err = errors.Wrap(err, "Error trying to apply schema in AWS EC2 metricbeat module.")
logp.Error(err)
err = errors.Wrap(err, "Error trying to apply schema schemaMetricSetFields in AWS EC2 metricbeat module.")
return
}
event.RootFields = resultRootFields
event.MetricSetFields = resultMetricSetFields
return
}
Expand All @@ -236,69 +271,44 @@ func getInstancesPerRegion(svc ec2iface.EC2API) (instanceIDs []string, instances
req := svc.DescribeInstancesRequest(describeInstanceInput)
output, err := req.Send()
if err != nil {
logp.Error(errors.Wrap(err, "Error DescribeInstances"))
err = errors.Wrap(err, "Error DescribeInstances")
return nil, nil, err
}

for _, reservation := range output.Reservations {
for _, instance := range reservation.Instances {
instanceID := *instance.InstanceId
instanceIDs = append(instanceIDs, instanceID)
instancesOutputs[instanceID] = instance
instanceIDs = append(instanceIDs, *instance.InstanceId)
instancesOutputs[*instance.InstanceId] = instance
}
}
}
return
}

func convertPeriodToDuration(period string, detailedMonitoring ec2.MonitoringState) (duration string, periodInSeconds int) {
func convertPeriodToDuration(period string) (string, int, error) {
// Amazon EC2 sends metrics to Amazon CloudWatch with 5-minute default frequency.
// If detailed monitoring is enabled, then data will be available in 1-minute period.
// Set starttime double the default frequency earlier than the endtime in order to make sure
// GetMetricDataRequest gets the latest data point for each metric.
numberPeriod, err := strconv.Atoi(period[0 : len(period)-1])
if err != nil {
logp.Error(errors.Wrap(err, "Error converting string to int. Use default duration instead."))
// If failed converting string to int, then set default duration to "-600s" with basic monitoring and "-120s" with
// detailed monitoring.
numberPeriod = 300
if detailedMonitoring == ec2.MonitoringStateEnabled {
numberPeriod = 60
}
duration = "-" + strconv.Itoa(numberPeriod*2) + "s"
periodInSeconds = numberPeriod
return duration, periodInSeconds
return "", 0, err
}

unitPeriod := period[len(period)-1:]
// if detailed monitoring is enabled, then period can be larger or equal than 1min.
// if detailed monitoring is disabled, then period can be larger or equal than 5min.
switch unitPeriod {
case "s":
if detailedMonitoring == ec2.MonitoringStateDisabled && numberPeriod < 300 {
numberPeriod = 300
} else if detailedMonitoring == ec2.MonitoringStateEnabled && numberPeriod < 60 {
numberPeriod = 60
}
duration = "-" + strconv.Itoa(numberPeriod*2) + unitPeriod
periodInSeconds = numberPeriod
duration := "-" + strconv.Itoa(numberPeriod*2) + unitPeriod
return duration, numberPeriod, nil
case "m":
if detailedMonitoring == ec2.MonitoringStateDisabled && numberPeriod < 5 {
numberPeriod = 5
} else if detailedMonitoring == ec2.MonitoringStateEnabled && numberPeriod < 1 {
numberPeriod = 1
}
duration = "-" + strconv.Itoa(numberPeriod*2) + unitPeriod
periodInSeconds = numberPeriod * 60
duration := "-" + strconv.Itoa(numberPeriod*2) + unitPeriod
periodInSec := numberPeriod * 60
return duration, periodInSec, nil
default:
numberPeriod = 300
if detailedMonitoring == ec2.MonitoringStateEnabled {
numberPeriod = 60
}
duration = "-" + strconv.Itoa(numberPeriod*2) + "s"
periodInSeconds = numberPeriod
err = errors.New("invalid period in config. Please reset period in config")
duration := "-" + strconv.Itoa(numberPeriod*2) + "s"
return duration, numberPeriod, err
}
return
}

func getMetricDataPerRegion(durationString string, periodInSec int, instanceID string, nextToken *string, svc cloudwatchiface.CloudWatchAPI) (*cloudwatch.GetMetricDataOutput, error) {
Expand Down
Loading