Skip to content
This repository has been archived by the owner on Jan 8, 2024. It is now read-only.

Server Telemetry #2402

Merged
merged 43 commits into from
Oct 12, 2021
Merged

Server Telemetry #2402

merged 43 commits into from
Oct 12, 2021

Conversation

izaaklauer
Copy link
Contributor

@izaaklauer izaaklauer commented Oct 1, 2021

This PR enables server telemetry, which creates OpenCensus traces and timing/count stats for each grpc request.

Closes #2364

Traces in datadog:

Screen Shot 2021-10-01 at 9 45 15 AM

Traces in jeager:
jeager-traces

Stats in datadog:

Screen Shot 2021-10-01 at 9 47 44 AM

What changed

  • Instruments the waypoint server with the ocgrpc stats handler.
  • Adds a new telemetry package, responsible for configuring opencensus and its exporters
  • Adds new flags to the waypoint server run for selecting and configuring either an opencensus agent exporter, or a direct datadog exporter
  • Adds a new telemetry top-level directory that contains a docker-compose with example infra (an oc-agent, an oc-collector, a datadog agent, and jeager), and an example of how to use it.

How to verify

Check out the telemetry top-level directory, and try running the example!

Copy link
Contributor

@catsby catsby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good so far! I had some minor thoughts on code structuring and maybe using an interface, but most of the feedback is around capitalizing names in the docs like OpenConcensus and gRPC, etc

internal/server/server.go Outdated Show resolved Hide resolved
Comment on lines 122 to 128
EnableOpenCensusExporter bool
OpenCensusExporterOptions []ocagent.ExporterOption

EnableDatadogExporter bool
DatadogExporterOptions datadog.Options

EnableZpages bool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we anticipate many more telemetry providers? If so, I feel like the Enable<name> bool here and the if t.Enabled<name> above seem like it could be placed behind an interface, and each implementation would be its own file like telemetry/datadog.go or so. Then when parsing the flags maybe WithDatadogExporter and the like actually append a type from those implementations, and the telemetry.Run code just uses the common interface methods for setup and tear-down functions.

Just a thought for consideration and discussion

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Elaborating / clarifying:

Instead of these bools and options, it would be more like

// internal/telemetry/telemetry.go 

type Exporter Interface {
  Options() ocagent.ExporterOption // returns the options
  Closer() func() // returns function that closes things down
}

type telemetry struct {
  [...]

  Exporters []*Exporter
  [...]
}

func Run(opts ...Option) error {
   [...]
   var t telemetry
  for _, opt := range opts {
  	opt(&t)
  }
  
  for _, e := range t.Exporters {
  		exporter, err := ocagent.NewExporter(e.Options())
		if err != nil {
			return status.Errorf(codes.InvalidArgument, "failed to initalize exporter: %s", err)
		}
		octrace.RegisterExporter(exporter)
		ocview.RegisterExporter(exporter)
		closeFuncs = append(closeFuncs, e.Closer())
  }

Again, just thought/discussion

telemetry/README.md Outdated Show resolved Hide resolved
telemetry/README.md Outdated Show resolved Hide resolved
telemetry/README.md Outdated Show resolved Hide resolved
telemetry/README.md Outdated Show resolved Hide resolved
telemetry/README.md Outdated Show resolved Hide resolved
telemetry/README.md Outdated Show resolved Hide resolved
internal/telemetry/telemetry.go Outdated Show resolved Hide resolved
Copy link
Contributor

@krantzinator krantzinator left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments to start

internal/cli/server_run.go Outdated Show resolved Hide resolved
telemetry/README.md Outdated Show resolved Hide resolved
telemetry/README.md Outdated Show resolved Hide resolved
telemetry/README.md Outdated Show resolved Hide resolved
telemetry/README.md Outdated Show resolved Hide resolved
telemetry/README.md Outdated Show resolved Hide resolved
Copy link
Contributor

@evanphx evanphx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few things to go along with what has already been said.

internal/telemetry/telemetry.go Outdated Show resolved Hide resolved
internal/server/grpc.go Show resolved Hide resolved
Copy link
Member

@briancain briancain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good to me! Most of my comments/review suggestions are just making sure we stay consistent when we refer to different tools in logs or doc strings. Feel free to apply those in a big commit since they're all related!

internal/cli/server_run.go Outdated Show resolved Hide resolved
internal/cli/server_run.go Outdated Show resolved Hide resolved
internal/cli/server_run.go Outdated Show resolved Hide resolved
internal/telemetry/telemetry.go Outdated Show resolved Hide resolved
internal/telemetry/telemetry.go Outdated Show resolved Hide resolved
internal/telemetry/telemetry.go Outdated Show resolved Hide resolved
telemetry/README.md Outdated Show resolved Hide resolved
telemetry/README.md Outdated Show resolved Hide resolved
.changelog/2402.txt Outdated Show resolved Hide resolved
internal/telemetry/telemetry.go Outdated Show resolved Hide resolved
Copy link
Member

@briancain briancain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! 👍🏻

internal/cli/server_run.go Show resolved Hide resolved
internal/telemetry/telemetry.go Show resolved Hide resolved
Copy link
Contributor

@krantzinator krantzinator left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few more capitalizations and doc nits! Excellent work on documentation overall, this is great.

telemetry/README.md Outdated Show resolved Hide resolved
telemetry/README.md Outdated Show resolved Hide resolved
telemetry/README.md Outdated Show resolved Hide resolved
telemetry/README.md Outdated Show resolved Hide resolved
telemetry/README.md Outdated Show resolved Hide resolved
telemetry/README.md Outdated Show resolved Hide resolved
telemetry/README.md Outdated Show resolved Hide resolved
telemetry/README.md Outdated Show resolved Hide resolved
internal/cli/server_run.go Outdated Show resolved Hide resolved
internal/server/server.go Outdated Show resolved Hide resolved
Co-authored-by: Rae Krantz <8461333+krantzinator@users.noreply.github.com>
Co-authored-by: Rae Krantz <8461333+krantzinator@users.noreply.github.com>
Co-authored-by: Rae Krantz <8461333+krantzinator@users.noreply.github.com>
Co-authored-by: Rae Krantz <8461333+krantzinator@users.noreply.github.com>
izaaklauer and others added 2 commits October 12, 2021 15:44
Co-authored-by: Rae Krantz <8461333+krantzinator@users.noreply.github.com>
Co-authored-by: Rae Krantz <8461333+krantzinator@users.noreply.github.com>
Co-authored-by: Rae Krantz <8461333+krantzinator@users.noreply.github.com>
Co-authored-by: Rae Krantz <8461333+krantzinator@users.noreply.github.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Emit grpc server metrics and traces with opencensus
6 participants