-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
move error tracking to context values, add multiple canaries to otel-cli status #227
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
While working on context plumbing I noticed the CLI tools don't stop the connection explicitly. I added that and started getting crashes because the client was nil in non-recording mode. The null client fixes those cases properly.
Simplified some code that didn't need to be so defensive.
@mterhar I got it this far for connection errors.
{
"config": {
"endpoint": "localhost:44444",
"traces_endpoint": "",
"protocol": "",
"timeout": "1s",
"otlp_headers": {},
"insecure": false,
"otlp_blocking": false,
"tls_ca_cert": "",
"tls_client_key": "",
"tls_client_cert": "",
"tls_no_verify": false,
"service_name": "otel-cli",
"span_name": "todo-generate-default-span-names",
"span_kind": "client",
"span_attributes": {},
"span_status_code": "unset",
"span_status_description": "",
"force_span_id": "",
"force_trace_id": "",
"traceparent_carrier_file": "",
"traceparent_ignore_env": false,
"traceparent_print": false,
"traceparent_print_export": false,
"traceparent_required": false,
"background_parent_poll_ms": 10,
"background_socket_directory": "",
"background_wait": false,
"background_skip_parent_pid_check": false,
"span_start_time": "now",
"span_end_time": "now",
"event_name": "todo-generate-default-event-names",
"event_time": "now",
"config_file": "",
"verbose": false,
"fail": false
},
"span_data": {
"is_sampled": "true",
"span_id": "f293c03186f0b7ae",
"trace_id": "32cc67787a086322c6b1a90b51bda1e6"
},
"env": {
"PATH": "/bin"
},
"diagnostics": {
"cli_args": [
"status",
"--endpoint",
"localhost:44444"
],
"is_recording": true,
"config_file_loaded": false,
"number_of_args": 3,
"detected_localhost": true,
"insecure_skip_verify": false,
"parsed_timeout_ms": 1000,
"endpoint": "grpc://localhost:44444",
"endpoint_source": "general",
"error": "rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial tcp 127.0.0.1:44444: connect: connection refused\"",
"exec_exit_code": 0,
"retries": 4
},
"errors": [
{
"Timestamp": "2023-06-23T15:21:09.476353318-04:00",
"Error": "rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial tcp 127.0.0.1:44444: connect: connection refused\""
},
{
"Timestamp": "2023-06-23T15:21:09.876525485-04:00",
"Error": "rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial tcp 127.0.0.1:44444: connect: connection refused\""
},
{
"Timestamp": "2023-06-23T15:21:09.876534878-04:00",
"Error": "rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial tcp 127.0.0.1:44444: connect: connection refused\""
}
]
} |
Something is broken in otlpserver so it's disconnecting after the first span, which is a whole different bug to fix. Handily, it demonstrates the usecase for keepalive testing.
This was wrong, but fine until I needed to send more than one span.
Remove duplicate deadline and especially the deferred cancel. This was one of a couple places where otlpclient prevented itself from working for more than one span. Also re-enable closing the grpc connection in Stop() now that other issues are solved.
Adds a first take on otel-cli status --keepalive.
Previously otel-cli status didn't call ParseCliTimeout() at all so the diagnostic wasn't set and had the zero value. Now that it's being called, the default value of 1s gets parsed so the test needs to specify that.
tobert
changed the title
plumb ctx all the way through the OTLP clients
move error tracking to context values, add otel-cli status --keepalive
Jun 23, 2023
Marked a couple things deprecated in status output that will be a lot of tedious work to refactor out, and can wait.
still more tweaking to do but this feels on the right track
tobert
changed the title
move error tracking to context values, add otel-cli status --keepalive
move error tracking to context values, add otel-cli status --canary-count
Jun 26, 2023
tobert
changed the title
move error tracking to context values, add otel-cli status --canary-count
move error tracking to context values, add multiple canaries to otel-cli status
Jun 26, 2023
Default canary count bumped to 1 to keep old behavior.
This was referenced Jun 26, 2023
tobert
added a commit
that referenced
this pull request
Jun 28, 2023
While working on #227 a few gaps in testing were exposed, especially for gRPC and HTTP headers. This PR adds functional tests for gRPC and HTTP header passing. Along the way I found the bug in gRPC headers, introduce in #205. When I wrote the gRPC client I assumed headers would be passed in with grpcOpts. I looked at the OTel collector, and it is passing them via grpc metadata. So I fixed that. Also added a test for the new ErrorList functionality. Just a single test for now but it's set up to add more. Squashed commitlog follows: * add a test for SaveError / GetErrorList Not complete, but works. Set up as a table test, with just one simple test for now. * pass time into SaveError so it's easier to test Also expanded test to compare error lists. * test headers, fix implementation of grpc headers There was an egregious testing gap. Mea culpa. Adds tests for headers through grpc and http. To do that, header support had to be added to the servers, so I did that. That provided the data needed to augment the test harness to check headers. Added that. Added functional tests that exercise the headers end-to-end. Noticed grpc.WaitForReady() while I was perusing otel collector and grpc docs, added a TODO to look into it later. Might be useful for otel-cli status. * actually assert against the new Headers values also added * support to checkHeaders
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
#226 is asking for something I've wanted to do for a while. The first step in making this possible is to plumb context through the OTLP clients, so that there's a place to tunnel errors in and out so they can be displayed. This also presents a path to eliminate the final remaining global in
otlpclient.Diag
and replace it with context tunneling.This PR also contains the addition of
otlpclient/otlp_client_null.go
that implements a non-recording client. While doing the context work I noticed I forgot to have CLI commands stop their clients. After adding that, otel-cli started panicing on nil pointers because the client was nil. NullClient fixes this.Added
otel-cli status --canary-count 1 --canary-interval 10ms
functionality for sending multiple canaries.