Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

move error tracking to context values, add multiple canaries to otel-cli status #227

Merged
merged 21 commits into from
Jun 28, 2023

Conversation

tobert
Copy link
Collaborator

@tobert tobert commented Jun 23, 2023

#226 is asking for something I've wanted to do for a while. The first step in making this possible is to plumb context through the OTLP clients, so that there's a place to tunnel errors in and out so they can be displayed. This also presents a path to eliminate the final remaining global in otlpclient.Diag and replace it with context tunneling.

This PR also contains the addition of otlpclient/otlp_client_null.go that implements a non-recording client. While doing the context work I noticed I forgot to have CLI commands stop their clients. After adding that, otel-cli started panicing on nil pointers because the client was nil. NullClient fixes this.

Added otel-cli status --canary-count 1 --canary-interval 10ms functionality for sending multiple canaries.

Amy Tobey added 6 commits June 23, 2023 14:22
While working on context plumbing I noticed the CLI tools don't stop the
connection explicitly. I added that and started getting crashes because
the client was nil in non-recording mode. The null client fixes those
cases properly.
Simplified some code that didn't need to be so defensive.
@tobert
Copy link
Collaborator Author

tobert commented Jun 23, 2023

@mterhar I got it this far for connection errors.

atobey@workbox ~/src/otel-cli $ go build && env -i PATH=/bin ./otel-cli status --endpoint localhost:44444

{
    "config": {
        "endpoint": "localhost:44444",
        "traces_endpoint": "",
        "protocol": "",
        "timeout": "1s",
        "otlp_headers": {},
        "insecure": false,
        "otlp_blocking": false,
        "tls_ca_cert": "",
        "tls_client_key": "",
        "tls_client_cert": "",
        "tls_no_verify": false,
        "service_name": "otel-cli",
        "span_name": "todo-generate-default-span-names",
        "span_kind": "client",
        "span_attributes": {},
        "span_status_code": "unset",
        "span_status_description": "",
        "force_span_id": "",
        "force_trace_id": "",
        "traceparent_carrier_file": "",
        "traceparent_ignore_env": false,
        "traceparent_print": false,
        "traceparent_print_export": false,
        "traceparent_required": false,
        "background_parent_poll_ms": 10,
        "background_socket_directory": "",
        "background_wait": false,
        "background_skip_parent_pid_check": false,
        "span_start_time": "now",
        "span_end_time": "now",
        "event_name": "todo-generate-default-event-names",
        "event_time": "now",
        "config_file": "",
        "verbose": false,
        "fail": false
    },
    "span_data": {
        "is_sampled": "true",
        "span_id": "f293c03186f0b7ae",
        "trace_id": "32cc67787a086322c6b1a90b51bda1e6"
    },
    "env": {
        "PATH": "/bin"
    },
    "diagnostics": {
        "cli_args": [
            "status",
            "--endpoint",
            "localhost:44444"
        ],
        "is_recording": true,
        "config_file_loaded": false,
        "number_of_args": 3,
        "detected_localhost": true,
        "insecure_skip_verify": false,
        "parsed_timeout_ms": 1000,
        "endpoint": "grpc://localhost:44444",
        "endpoint_source": "general",
        "error": "rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial tcp 127.0.0.1:44444: connect: connection refused\"",
        "exec_exit_code": 0,
        "retries": 4
    },
    "errors": [
        {
            "Timestamp": "2023-06-23T15:21:09.476353318-04:00",
            "Error": "rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial tcp 127.0.0.1:44444: connect: connection refused\""
        },
        {
            "Timestamp": "2023-06-23T15:21:09.876525485-04:00",
            "Error": "rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial tcp 127.0.0.1:44444: connect: connection refused\""
        },
        {
            "Timestamp": "2023-06-23T15:21:09.876534878-04:00",
            "Error": "rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial tcp 127.0.0.1:44444: connect: connection refused\""
        }
    ]
}

Amy Tobey added 6 commits June 23, 2023 16:09
Something is broken in otlpserver so it's disconnecting after the first
span, which is a whole different bug to fix. Handily, it demonstrates
the usecase for keepalive testing.
This was wrong, but fine until I needed to send more than one span.
Remove duplicate deadline and especially the deferred cancel. This was
one of a couple places where otlpclient prevented itself from working
for more than one span.

Also re-enable closing the grpc connection in Stop() now that other
issues are solved.
Adds a first take on otel-cli status --keepalive.
Previously otel-cli status didn't call ParseCliTimeout() at all so the
diagnostic wasn't set and had the zero value. Now that it's being
called, the default value of 1s gets parsed so the test needs to specify
that.
@tobert tobert changed the title plumb ctx all the way through the OTLP clients move error tracking to context values, add otel-cli status --keepalive Jun 23, 2023
Amy Tobey added 5 commits June 26, 2023 10:33
Marked a couple things deprecated in status output that will be
a lot of tedious work to refactor out, and can wait.
still more tweaking to do but this feels on the right track
@tobert tobert changed the title move error tracking to context values, add otel-cli status --keepalive move error tracking to context values, add otel-cli status --canary-count Jun 26, 2023
@tobert tobert changed the title move error tracking to context values, add otel-cli status --canary-count move error tracking to context values, add multiple canaries to otel-cli status Jun 26, 2023
@tobert tobert deleted the grpc-info branch June 28, 2023 18:47
tobert added a commit that referenced this pull request Jun 28, 2023
While working on #227 a few gaps in testing were exposed, especially for gRPC and HTTP headers.

This PR adds functional tests for gRPC and HTTP header passing. Along the way I found the bug
in gRPC headers, introduce in #205. When I wrote the gRPC client I assumed headers would be
passed in with grpcOpts. I looked at the OTel collector, and it is passing them via grpc metadata.
So I fixed that.

Also added a test for the new ErrorList functionality. Just a single test for now but it's set up to add more.

Squashed commitlog follows:

* add a test for SaveError / GetErrorList

Not complete, but works. Set up as a table test, with just one simple
test for now.

* pass time into SaveError so it's easier to test

Also expanded test to compare error lists.

* test headers, fix implementation of grpc headers

There was an egregious testing gap. Mea culpa.

Adds tests for headers through grpc and http.

To do that, header support had to be added to the servers, so I did
that.

That provided the data needed to augment the test harness to check
headers. Added that.

Added functional tests that exercise the headers end-to-end.

Noticed grpc.WaitForReady() while I was perusing otel collector and
grpc docs, added a TODO to look into it later. Might be useful for
otel-cli status.

* actually assert against the new Headers values

also added * support to checkHeaders
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant