Enable better Envoy debugging #2792

mattmoor · 2020-08-13T22:59:58Z

Please describe the problem you have

The original motivation for this is to be able to trace back from a simple 503 response to the Envoy that served it, since generally it's logs contain a treasure trove of information about what happened.

In the bad response, there is typically a Server:[envoy] and the slack thread is about whether this is configurable. @jpeach dug up this in the Envoy API: https://www.envoyproxy.io/docs/envoy/latest/api-v3/extensions/filters/network/http_connection_manager/v3/http_connection_manager.proto#envoy-v3-api-field-extensions-filters-network-http-connection-manager-v3-httpconnectionmanager-server-name

Ideally this would serve the Envoy pod name, so that its logs can be quickly accessed.

It would be next-level cool if this could contain checksum information for the configuration the Envoy is programmed with that could be cross-correlated with the Contour logs to get the full Envoy configuration at the point of failure 😇

I'm thinking something like a User-Agent string with key-value strings.

Server=[envoy-zxhkjh; EDS={hash}; RDS={hash}; ...]

Generally Contour's logs are painfully terse today, and the above would be extremely helpful.

The text was updated successfully, but these errors were encountered:

jpeach · 2020-08-19T05:31:59Z

On the checksum approach specifically, wouldn't you need to correlate those checksums with something? That said, I'm not sure that there is a good something for this to correlate with. EDS can be hashed, but you'd want to also know the actual state that was associated with the hash.

xref #2021

mattmoor · 2020-08-19T14:06:36Z

Yeah, the idea was that Contour would need to log some stuff to correlate this with:

could be cross-correlated with the Contour logs

github-actions · 2024-01-11T00:20:26Z

The Contour project currently lacks enough contributors to adequately respond to all Issues.

This bot triages Issues according to the following rules:

After 60d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, the Issue is closed

You can:

Mark this Issue as fresh by commenting
Close this Issue
Offer to help out with triage

Please send feedback to the #contour channel in the Kubernetes Slack

github-actions · 2024-02-15T00:19:14Z

The Contour project currently lacks enough contributors to adequately respond to all Issues.

This bot triages Issues according to the following rules:

After 60d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, the Issue is closed

You can:

Mark this Issue as fresh by commenting
Close this Issue
Offer to help out with triage

Please send feedback to the #contour channel in the Kubernetes Slack

stevesloka added the area/operational Issues or PRs about making Contour easier to operate as a production service. label Aug 31, 2020

github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 11, 2024

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Feb 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable better Envoy debugging #2792

Enable better Envoy debugging #2792

mattmoor commented Aug 13, 2020

jpeach commented Aug 19, 2020

mattmoor commented Aug 19, 2020

github-actions bot commented Jan 11, 2024

github-actions bot commented Feb 15, 2024

Enable better Envoy debugging #2792

Enable better Envoy debugging #2792

Comments

mattmoor commented Aug 13, 2020

jpeach commented Aug 19, 2020

mattmoor commented Aug 19, 2020

github-actions bot commented Jan 11, 2024

github-actions bot commented Feb 15, 2024