Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

config: sane defaults for envoy #1375

Closed
glerchundi opened this issue Aug 22, 2019 · 9 comments
Closed

config: sane defaults for envoy #1375

glerchundi opened this issue Aug 22, 2019 · 9 comments
Labels
kind/feature Categorizes issue or PR as related to a new feature. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.

Comments

@glerchundi
Copy link
Contributor

I came across this PR from @PiotrSikora and thought it would be interesting to corroborate if those 'best practices' are being followed in Contour.

@davecheney
Copy link
Contributor

Thank you for this. There is some very useful information here.

Scheduling for rc1 to review and see what if any changes we should take from this document.

@davecheney davecheney added this to the 1.0.0-rc.1 milestone Aug 22, 2019
@davecheney davecheney added kind/feature Categorizes issue or PR as related to a new feature. blocked/needs-product Categorizes the issue or PR as blocked because it needs a decision from product management. labels Aug 22, 2019
@ravilr
Copy link

ravilr commented Aug 22, 2019

Also, this envoyproxy/envoy#6323 . Can we configure tcp keepalive options to tear down half open connections for both Contour grpc management server upstream cluster and also (optionally may be) for all K8s service upstream clusters in envoy config programmed by Contour.

@davecheney davecheney modified the milestones: 1.0.0-rc.1, 1.0.0-rc.2 Sep 29, 2019
@davecheney davecheney added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Oct 14, 2019
@youngnick youngnick self-assigned this Oct 20, 2019
@youngnick
Copy link
Member

youngnick commented Oct 21, 2019

Building a todo list for this one:

For the TCP keepalive settings, that is 100% worth doing, I've made separate issue (#1744) for that one, as it might possibly be related to another issue.

TCP Proxies

HTTP Proxies should additionally configure

  • connection maanager use_remote_address to true (see header_sanitizing for details)
  • connect_timeout 15s (already set to 250ms (!)) This one was left as 250ms because this is the timeout from the Envoy to the backend service. ("upstream" service in Envoy terminology)
  • per_connection_buffer_limit_byes to 32 KiB
  • HTTP/2 maximum concurrent streams limit to 100
  • HTTP/2 initial stream window size limit to 64KiB
  • HTTP/2 initial connection window size limit to 1MiB

YAML example:

overload_manager:
  refresh_interval: 0.25s
  resource_monitors:
  - name: "envoy.resource_monitors.fixed_heap"
    config:
      # TODO: Tune for your system.
      max_heap_size_bytes: 2147483648 # 2 GiB
  actions:
  - name: "envoy.overload_actions.shrink_heap"
    triggers:
    - name: "envoy.resource_monitors.fixed_heap"
      threshold:
        value: 0.95
  - name: "envoy.overload_actions.stop_accepting_requests"
    triggers:
    - name: "envoy.resource_monitors.fixed_heap"
      threshold:
        value: 0.98

static_resources:
  listeners:
  - address:
      socket_address: { address: 127.0.0.1, port_value: 443 }
    listener_filters:
    - name: "envoy.listener.tls_inspector"
      typed_config: {}
    per_connection_buffer_limit_bytes: 32768 # 32 KiB
    filter_chains:
    - filter_chain_match:
        server_names: ["example.com", "www.example.com"]
      tls_context:
        common_tls_context:
          tls_certificates:
          - certificate_chain: { filename: "example_com_cert.pem" }
            private_key: { filename: "example_com_key.pem" }
      filters:
      - name: envoy.http_connection_manager
        typed_config:
          "@type": type.googleapis.com/envoy.config.filter.network.http_connection_manager.v2.HttpConnectionManager
          stat_prefix: ingress_http
          use_remote_address: true
          common_http_protocol_options:
            idle_timeout: 840s
          http2_protocol_options:
            max_concurrent_streams: 100
            initial_stream_window_size: 65536 # 64 KiB
            initial_connection_window_size: 1048576 # 1 MiB
          route_config:
            virtual_hosts:
            - name: default
              domains: "*"
              routes:
              - match: { prefix: "/" }
                route: { cluster: service_foo }
  clusters:
    name: service_foo
    connect_timeout: 15s
    per_connection_buffer_limit_bytes: 32768 # 32 KiB
    hosts:
      socket_address:
        address: 127.0.0.1
        port_value: 8080
    http2_protocol_options:
      max_concurrent_streams: 100
      initial_stream_window_size: 65536 # 64 KiB
      initial_connection_window_size: 1048576 # 1 MiB

@youngnick
Copy link
Member

Because two of the tasks are tracked in #1407 and #1408, I'll mark those changes as done here.

jpeach added a commit to jpeach/contour that referenced this issue Oct 23, 2019
Many tests need to specify default fields for envoy `Cluster` structs
so that struct equality tests can succeed. Specifying these defaults
in many places is fragile and tends to make it harder to see the
field values that actually matter to the tests.  Consolidating the
defaults into a handful of helpers reduces the overally clutter.
We still need a couple of duplicates of the helper so that we can
avoid package dependency cycles.

This updates projectcontour#1375.

Signed-off-by: James Peach <jpeach@vmware.com>
jpeach added a commit to jpeach/contour that referenced this issue Oct 24, 2019
Many tests need to specify default fields for envoy `Cluster` structs
so that struct equality tests can succeed. Specifying these defaults
in many places is fragile and tends to make it harder to see the
field values that actually matter to the tests.  Consolidating the
defaults into a handful of helpers reduces the overally clutter.
We still need a couple of duplicates of the helper so that we can
avoid package dependency cycles.

This updates projectcontour#1375.

Signed-off-by: James Peach <jpeach@vmware.com>
jpeach added a commit that referenced this issue Oct 24, 2019
Many tests need to specify default fields for envoy `Cluster` structs
so that struct equality tests can succeed. Specifying these defaults
in many places is fragile and tends to make it harder to see the
field values that actually matter to the tests.  Consolidating the
defaults into a handful of helpers reduces the overally clutter.
We still need a couple of duplicates of the helper so that we can
avoid package dependency cycles.

This updates #1375.

Signed-off-by: James Peach <jpeach@vmware.com>
@youngnick
Copy link
Member

Now that I've created all the sub-issues for each change, I'm going to assign this to the backlog so that we can still use it for tracking.

@mattalberts
Copy link

@youngnick @davecheney

I've been demo'ing out a series of H2 options, receive buffer sizing, and a number of different controls. I've seen fairly dramatic improvement in both awkward memory exhaustion (#499) and cpu's stuck spinning (here, after an issue, the cpu utilization will jump by 1, gdb suggests the worker thread is stuck in a non-recoverable spin-loop).

I wanted to check-in and see where you were with changes before simply submitting a PR (I can also break this out into a separate issue with graphs and other data).

@davecheney
Copy link
Contributor

@mattalberts all the status is in the ticket (sorry, that's not meant to sound snarky, just an reaffirmation that we don't have a secret discussion backchannel).

Any information you can provide for #499 would be greatly received.

@skriss
Copy link
Member

skriss commented Jan 4, 2022

@projectcontour/maintainers looks like all the TODOs were addressed here, OK to close this out? Any further changes could be tracked in more specific issues.

@sunjayBhatia
Copy link
Member

sounds good to me 👍🏽

@skriss skriss closed this as completed Jan 4, 2022
@skriss skriss removed this from the Backlog milestone Jan 4, 2022
@skriss skriss removed the blocked/needs-product Categorizes the issue or PR as blocked because it needs a decision from product management. label Jan 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Projects
None yet
Development

No branches or pull requests

7 participants