Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

switch otel-cli to use a custom otlp client #205

Merged
merged 31 commits into from
Jun 13, 2023
Merged

switch otel-cli to use a custom otlp client #205

merged 31 commits into from
Jun 13, 2023

Conversation

tobert
Copy link
Collaborator

@tobert tobert commented May 12, 2023

When this PR is ready, it will offer new grpc and http/protobuf OTLP clients in a package, that will be integrated into otel-cli to replace the OTel collector's exporters that are in use now.

Goals:

  • support grpc, http/protobuf, and maybe file:// (see implement server spool to disk & replay #203) exporters, using grpc and http directly
    • iow, an exporter is code that takes a list of spans and sends them across the wire, with http or grpc
    • this should be as simple as possible in otel-cli
  • synchronous is fine
  • use the standard library or grpc directly wherever possible
  • refactor plumbing code to use new clients
    • something like err := otlpClient.SendSpan(protoSpan)
  • simple retries & timeouts
  • remove dependency on otel collector code
  • pass envvars through to children of exec
    • OTEL_EXPORTER_OTLP_ENDPOINT and so on, will no longer get double-processed by otel-cli and the backend code
    • otel-cli makes a few affordances for the shell environment, that the exporter doesn't like
      *(btw I'm not sold on file:// endpoints idea, but it sure does seem useful for some niche cases where I'd like to make traces, like early boot)
  • must be compatible with any standards compliant upstream OTLP server
    • otel-cli imports the upstream protobuf and semantic conventions so things shouldn't be too hard to line up and maintain
  • lots of testing, of course

Non-goals:

  • metrics, logs. just traces for now
  • for the love of Gopher don't copy exactly what I did in otlpserver. otlpserver.CliEvent is an abomination and I feel bad about it and intend to refactor it out over time.

Tradeoffs:

  • will need to keep a closer eye on protocol changes
  • "opentelemetry-collector works fine" is true, and it has a ton of features. I just can't guarantee configuration state when it reads envvars behind my back. deleting envvars before the exporter reads them blocks them from otel-cli exec children as well, and that's turned out to be an unpleasant enough UX to try a lighter client

@tobert
Copy link
Collaborator Author

tobert commented May 13, 2023

Having read what I just wrote, it occurs to me I could stash the env data in Config and then pass it to os.Cmd regardless of deleting envvars in the process 🤦🏻‍♀️ This is definitely simpler in the short term 😆

Amy Tobey added 16 commits May 30, 2023 12:05
Moves otel-cli to use gRPC directly instead of using the wrapper code
from the OTel collector.

Along the way, it made sense to change span-generating commands to start
the connection independent of sending the span, so connection can happen
in the background while other work is being done.

Also in testing things out manually, I noticed that --timeout on a
process that exceeds the timeout is broken, and things can hang. Put
some hax in to make it work for now, and will come back later to fix it
right, probably by adding --kill-timeout or something like that to make
it explicit the process otel-cli runs may be killed.
Looks like first pass works, but needs some work on error handling.

Also add the new grpc file because I forgot to add on last commit. This
is all getting squashed at the end anyways.
Timeouts are still a bit weird, working on it. I implemented a super
simple retry for both http and grpc. It does some things differently but
seems to work.

Also lots of error handling & docs to clean up still.
I like the idea but it's a lot of work to untangle it from the config,
for not much gain. Code is simpler bundled in otelcli so let's keep it
there.
There were some workarounds for getting errors out of the OTel stack.
Those are no longer necessary. Renamed the OtelError field to just Error
bc that's how it's used now. It's still a string because marshaling to
the error type isn't cool.
The success case wasn't clear. It now reads the body and closes it too.

Still need to do something with the response but that will come later.
@tobert
Copy link
Collaborator Author

tobert commented May 31, 2023

As of commit bce673c, the binary is about 300kb smaller than before :) That wasn't a goal really but it's nice to see dependency reduction show up in a measurable way.

Amy Tobey added 11 commits June 6, 2023 15:59
This work needs to be done, but my approach was awful, and
it doesn't belong in this PR.
go vet started showing errors on copied protobuf spans. It wasn't really
a problem in otel-cli but go vet is right, copying a mutex is pretty
much always a bad idea.
add a test file, just a start
interim step, needs more work...
While filling in gRPC error handling, I added server-side recommended
delay time. This required refactoring the retryFunc signature to tunnel
that wait through to the retry function. This ended up touching http as
well because they share the retry func.
@tobert tobert marked this pull request as ready for review June 8, 2023 21:06
Amy Tobey added 3 commits June 8, 2023 17:06
Really need to expand the test framework to test this but it's a big
lift for another PR.
@tobert tobert merged commit 71b9cf1 into main Jun 13, 2023
@tobert tobert deleted the custom-otlp-client branch June 13, 2023 16:19
tobert added a commit that referenced this pull request Jun 28, 2023
While working on #227 a few gaps in testing were exposed, especially for gRPC and HTTP headers.

This PR adds functional tests for gRPC and HTTP header passing. Along the way I found the bug
in gRPC headers, introduce in #205. When I wrote the gRPC client I assumed headers would be
passed in with grpcOpts. I looked at the OTel collector, and it is passing them via grpc metadata.
So I fixed that.

Also added a test for the new ErrorList functionality. Just a single test for now but it's set up to add more.

Squashed commitlog follows:

* add a test for SaveError / GetErrorList

Not complete, but works. Set up as a table test, with just one simple
test for now.

* pass time into SaveError so it's easier to test

Also expanded test to compare error lists.

* test headers, fix implementation of grpc headers

There was an egregious testing gap. Mea culpa.

Adds tests for headers through grpc and http.

To do that, header support had to be added to the servers, so I did
that.

That provided the data needed to augment the test harness to check
headers. Added that.

Added functional tests that exercise the headers end-to-end.

Noticed grpc.WaitForReady() while I was perusing otel collector and
grpc docs, added a TODO to look into it later. Might be useful for
otel-cli status.

* actually assert against the new Headers values

also added * support to checkHeaders
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant