-
Notifications
You must be signed in to change notification settings - Fork 327
Initial metrics spike for gathering metrics about Waypoint operations #3440
Conversation
@@ -374,7 +375,9 @@ This command will bootstrap the server and setup a CLI context. | |||
telemetryOptions = append(telemetryOptions, telemetry.WithDatadogExporter( | |||
datadog.Options{ | |||
TraceAddr: c.flagTelemetryDatadogTraceAddr, | |||
StatsAddr: c.flagTelemetryDatadogTraceAddr, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here I simply re-use the current trace address, but we could easily add a separate flag for stats.
@@ -874,3 +880,50 @@ func (s *Service) runnerVerifyToken( | |||
|
|||
return nil | |||
} | |||
|
|||
func operationString(job *pb.Job) string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we seem to do this in more than one place in Waypoint, so I'm open to suggestions to where we could place this and use it. Maybe /pkg/server/ptypes/job.go?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The CLI client needs something like this too, so maybe pkg/server/...
isn't the right place. Maybe it could go some where in internal/...
? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just the one upper case change that makes my eye twitch.
Co-authored-by: Evan Phoenix <evan@hashicorp.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noice! ✨ Looks pretty good to me 😄 I had a few minor comments.
@@ -874,3 +880,50 @@ func (s *Service) runnerVerifyToken( | |||
|
|||
return nil | |||
} | |||
|
|||
func operationString(job *pb.Job) string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The CLI client needs something like this too, so maybe pkg/server/...
isn't the right place. Maybe it could go some where in internal/...
? 🤔
case *pb.Job_StopTask: | ||
return "stop_task" | ||
case *pb.Job_WatchTask: | ||
return "watch_task" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about making these PascalCase so they match the proto go definition? Like WatchTask
, ConfigSync
, Poll
, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From what I've seen internally (in other HashiCorp projects that use OpenCensus), lowercase with _
is the common style, so I chose this to be consistent
EDIT: clarify "where" internally I have seen this style
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ooh ok! Makes sense from that standpoint 👍🏻 Sounds good.
This PR is a spiritual continuation of #2402 where we initially introduce telemetry. Here we build off of the inclusion of the DataDog exporter and enable collecting basic metrics for our operations.
Example of collected operation metrics:
There are some caveats here, in that the operation times are reported as gauges and not distributions (see DataDog/opencensus-go-exporter-datadog#17). This pattern is however consistent with some of our internal (private) packages from other projects.
Additional metrics can be collected by adding new OpenCensus Views and Stats in
metrics/stats.go
and exposed to Waypoint as public methods in themetrics
package.Note that at least for now, all the metric collection is done server side, and does not communicate with or inspect runners. Because runners may be ran elsewhere than the server, additional DataDog setup would/could be required. A naive approach there would be to assume datadog is running where the runners are and simply pass on the stats address and instruct the runner to start an exporter, but for the sake of simplicity here we only do server side.