Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable better Envoy debugging #2792

Closed
mattmoor opened this issue Aug 13, 2020 · 4 comments
Closed

Enable better Envoy debugging #2792

mattmoor opened this issue Aug 13, 2020 · 4 comments
Labels
area/operational Issues or PRs about making Contour easier to operate as a production service. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.

Comments

@mattmoor
Copy link
Contributor

Please describe the problem you have

Original slack thread

The original motivation for this is to be able to trace back from a simple 503 response to the Envoy that served it, since generally it's logs contain a treasure trove of information about what happened.

In the bad response, there is typically a Server:[envoy] and the slack thread is about whether this is configurable. @jpeach dug up this in the Envoy API: https://www.envoyproxy.io/docs/envoy/latest/api-v3/extensions/filters/network/http_connection_manager/v3/http_connection_manager.proto#envoy-v3-api-field-extensions-filters-network-http-connection-manager-v3-httpconnectionmanager-server-name

Ideally this would serve the Envoy pod name, so that its logs can be quickly accessed.


It would be next-level cool if this could contain checksum information for the configuration the Envoy is programmed with that could be cross-correlated with the Contour logs to get the full Envoy configuration at the point of failure 😇

I'm thinking something like a User-Agent string with key-value strings.

Server=[envoy-zxhkjh; EDS={hash}; RDS={hash}; ...]

Generally Contour's logs are painfully terse today, and the above would be extremely helpful.

@jpeach
Copy link
Contributor

jpeach commented Aug 19, 2020

On the checksum approach specifically, wouldn't you need to correlate those checksums with something? That said, I'm not sure that there is a good something for this to correlate with. EDS can be hashed, but you'd want to also know the actual state that was associated with the hash.

xref #2021

@mattmoor
Copy link
Contributor Author

Yeah, the idea was that Contour would need to log some stuff to correlate this with:

could be cross-correlated with the Contour logs

@stevesloka stevesloka added the area/operational Issues or PRs about making Contour easier to operate as a production service. label Aug 31, 2020
Copy link

The Contour project currently lacks enough contributors to adequately respond to all Issues.

This bot triages Issues according to the following rules:

  • After 60d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, the Issue is closed

You can:

  • Mark this Issue as fresh by commenting
  • Close this Issue
  • Offer to help out with triage

Please send feedback to the #contour channel in the Kubernetes Slack

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 11, 2024
Copy link

The Contour project currently lacks enough contributors to adequately respond to all Issues.

This bot triages Issues according to the following rules:

  • After 60d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, the Issue is closed

You can:

  • Mark this Issue as fresh by commenting
  • Close this Issue
  • Offer to help out with triage

Please send feedback to the #contour channel in the Kubernetes Slack

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Feb 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/operational Issues or PRs about making Contour easier to operate as a production service. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.
Projects
None yet
Development

No branches or pull requests

3 participants