fix: apply large ipv4 neigh GC settings to nodes of all sizes #2732

xizha162 · 2020-02-15T23:18:24Z

Reason for Change:

We should apply the large IPv4 neigh GC settings for all nodes (instead of just nodes with more than 8 cores). This is needed because we have seen many cases where CoreDNS is only working intermittently due to this issue. CoreDNS may not run on a large node but still has to handle many connections to it from many other pods.

Issue Fixed:

Requirements:

uses conventional commit messages
includes documentation
adds unit tests
tested upgrade from previous version

Notes:

codecov · 2020-02-15T23:31:52Z

Codecov Report

Merging #2732 into master will increase coverage by <.01%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #2732      +/-   ##
==========================================
+ Coverage   72.33%   72.33%   +<.01%     
==========================================
  Files         137      137              
  Lines       25317    25317              
==========================================
+ Hits        18313    18314       +1     
+ Misses       5948     5947       -1     
  Partials     1056     1056

mboersma

Makes sense to me based on your description, but I don't know why this tuning was restricted to big boxes only. Let's get more eyes on this to be safe.

jackfrancis · 2020-02-18T17:35:56Z

These configurations, as far as I understand, are to set the allowable number of ARP entries to cache before enabling garbage collection. I assume that ARP cache garbage collection operates according to a single criteria: "if there are more entries in the ARP cache than are allowed, delete the n oldest entries until the cache entry count drops below the maximum allowed number".

In practice, does this mean that the cache entry count is the effective ARP entry count in general? I.e., assuming a vm is continually thrashing at the maximum count, does this mean in practice additional latency for layer 2 operations (e.g., establishing a MAC <--> IP mapping), which bubbles up to higher abstraction layers, affecting workloads?

Also, the original change to only increase the ARP cache enties for "8 core vms"... was that to ensure that we don't impose memory requirements on vms that don't have a lot of memory overhead? Otherwise I don't know why we'd set this value based on the vm size, as the amount of ARP activity should have nothing to do with the vm size, but instead everything to do with the scale and behaviors of the cluster it is participating in.

Hope that makes sense, just want to clearly document this change and proceed according to a purposeful understanding of why we're making this change.

xizha162 · 2020-02-18T21:04:44Z

@jackfrancis , regarding your questions, for #1, yes, you are correct. For hub-like pods (CoreDNS for example) that communicates with MANY other pods, they need a much larger ARP cache on the node to store the all those MAC<-> IP mappings. When ARP cache thrashing happens, you will see intermittent DNS resolution failure due to ARP cache overflow error.

For #2, yes, there will be some memory overhead with larger values for these settings. That is why I want to be safe and apply only to large nodes initially. However, from CRI, we noticed that customer can hit this issue with 4 core node and 1500+ pods. After apply the settings, their issues is gone. So we think it is safe to apply the settings to all nodes now.

jackfrancis · 2020-02-18T21:10:56Z

Can we determine memory overhead? For nodes w/ hundreds of clusters, it's possible that 8k is still not enough. I wonder if we can set a follow-up item to ensure we aren't being unnecessarily conservative. It would be nice to document exactly how much memory overhead requirements we're adding to the kubelet runtime.

jackfrancis

/lgtm

acs-bot · 2020-02-18T21:21:23Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jackfrancis, mboersma, xizhamsft

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [jackfrancis,mboersma]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

xizha162 · 2020-02-18T22:01:49Z

@jackfrancis , yes I will make sure to include a follow up work item in large node work.

apply large ipv3 neigh GC settings to nodes of all sizes

0053f84

acs-bot added the size/S label Feb 15, 2020

mboersma approved these changes Feb 18, 2020

View reviewed changes

acs-bot added the approved label Feb 18, 2020

xizha162 requested review from aadishjain2212 and jackfrancis February 18, 2020 20:53

jackfrancis approved these changes Feb 18, 2020

View reviewed changes

acs-bot assigned jackfrancis Feb 18, 2020

acs-bot added the lgtm label Feb 18, 2020

jackfrancis merged commit e870f47 into Azure:master Feb 18, 2020

fmotrifork mentioned this pull request Mar 23, 2020

aks-engine 0.48.0 fishworks/fish-food#639

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: apply large ipv4 neigh GC settings to nodes of all sizes #2732

fix: apply large ipv4 neigh GC settings to nodes of all sizes #2732

xizha162 commented Feb 15, 2020 •

edited

Loading

codecov bot commented Feb 15, 2020 •

edited

Loading

mboersma left a comment

jackfrancis commented Feb 18, 2020

xizha162 commented Feb 18, 2020

jackfrancis commented Feb 18, 2020

jackfrancis left a comment

acs-bot commented Feb 18, 2020

xizha162 commented Feb 18, 2020

fix: apply large ipv4 neigh GC settings to nodes of all sizes #2732

fix: apply large ipv4 neigh GC settings to nodes of all sizes #2732

Conversation

xizha162 commented Feb 15, 2020 • edited Loading

codecov bot commented Feb 15, 2020 • edited Loading

Codecov Report

mboersma left a comment

Choose a reason for hiding this comment

jackfrancis commented Feb 18, 2020

xizha162 commented Feb 18, 2020

jackfrancis commented Feb 18, 2020

jackfrancis left a comment

Choose a reason for hiding this comment

acs-bot commented Feb 18, 2020

xizha162 commented Feb 18, 2020

xizha162 commented Feb 15, 2020 •

edited

Loading

codecov bot commented Feb 15, 2020 •

edited

Loading