Skip to content
This repository has been archived by the owner on Jan 8, 2024. It is now read-only.

Initial metrics spike for gathering metrics about Waypoint operations #3440

Merged
merged 6 commits into from
Jun 21, 2022

Conversation

catsby
Copy link
Contributor

@catsby catsby commented Jun 8, 2022

This PR is a spiritual continuation of #2402 where we initially introduce telemetry. Here we build off of the inclusion of the DataDog exporter and enable collecting basic metrics for our operations.

Example of collected operation metrics:

metrics

There are some caveats here, in that the operation times are reported as gauges and not distributions (see DataDog/opencensus-go-exporter-datadog#17). This pattern is however consistent with some of our internal (private) packages from other projects.

Additional metrics can be collected by adding new OpenCensus Views and Stats in metrics/stats.go and exposed to Waypoint as public methods in the metrics package.

Note that at least for now, all the metric collection is done server side, and does not communicate with or inspect runners. Because runners may be ran elsewhere than the server, additional DataDog setup would/could be required. A naive approach there would be to assume datadog is running where the runners are and simply pass on the stats address and instruct the runner to start an exporter, but for the sake of simplicity here we only do server side.

@github-actions github-actions bot added the core label Jun 8, 2022
@catsby catsby requested a review from a team June 8, 2022 22:24
@@ -374,7 +375,9 @@ This command will bootstrap the server and setup a CLI context.
telemetryOptions = append(telemetryOptions, telemetry.WithDatadogExporter(
datadog.Options{
TraceAddr: c.flagTelemetryDatadogTraceAddr,
StatsAddr: c.flagTelemetryDatadogTraceAddr,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I simply re-use the current trace address, but we could easily add a separate flag for stats.

@@ -874,3 +880,50 @@ func (s *Service) runnerVerifyToken(

return nil
}

func operationString(job *pb.Job) string {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we seem to do this in more than one place in Waypoint, so I'm open to suggestions to where we could place this and use it. Maybe /pkg/server/ptypes/job.go?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CLI client needs something like this too, so maybe pkg/server/... isn't the right place. Maybe it could go some where in internal/...? 🤔

Copy link
Contributor

@evanphx evanphx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just the one upper case change that makes my eye twitch.

pkg/server/singleprocess/service_runner.go Outdated Show resolved Hide resolved
Co-authored-by: Evan Phoenix <evan@hashicorp.com>
Copy link
Member

@briancain briancain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noice! ✨ Looks pretty good to me 😄 I had a few minor comments.

@@ -874,3 +880,50 @@ func (s *Service) runnerVerifyToken(

return nil
}

func operationString(job *pb.Job) string {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CLI client needs something like this too, so maybe pkg/server/... isn't the right place. Maybe it could go some where in internal/...? 🤔

case *pb.Job_StopTask:
return "stop_task"
case *pb.Job_WatchTask:
return "watch_task"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about making these PascalCase so they match the proto go definition? Like WatchTask, ConfigSync, Poll, etc.

Copy link
Contributor Author

@catsby catsby Jun 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I've seen internally (in other HashiCorp projects that use OpenCensus), lowercase with _ is the common style, so I chose this to be consistent

EDIT: clarify "where" internally I have seen this style

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooh ok! Makes sense from that standpoint 👍🏻 Sounds good.

pkg/server/singleprocess/service_runner.go Show resolved Hide resolved
@catsby catsby merged commit 7b83979 into main Jun 21, 2022
@catsby catsby deleted the simple-metrics branch June 21, 2022 20:44
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants