config: add an option to control memory size calculation #1823

liwei · 2019-06-22T04:25:06Z

The sandbox memory overhead was not take into account when
calculating sandbox memory quota size, that cause the following
issues:

scheduler can't get an accurate memory usage information
memory size info in container not match user specified memory limit,
that would confuse the user

Before the pod overhead patch1 got merged, let's add an option
to control memory size calculation to workaround these issues.

Fixes: #1822
Signed-off-by: Li Wei wei@hyper.sh

bergwolf · 2019-06-22T05:11:01Z

How about using the new option to control both CPU and memory calculation whether the initial guest CPU/memory is counted in quota?

bergwolf · 2019-06-22T05:11:40Z

btw, travis failed at

pkg/katautils/config.go:84:17: struct of size 296 bytes could be of size 288 bytes (maligned)
type hypervisor struct {

liwei · 2019-06-24T01:52:10Z

How about using the new option to control both CPU and memory calculation whether the initial guest CPU/memory is counted in quota?

Sounds good, I will update the PR.

The vcpu/memory overhead of sandbox are not take into account when calculating container resource quota, that cause the following issues: - scheduler can't get an accurate resource usage information - cpu/memory size in container do not match user specified, that would confuse users Before the pod overhead patch[1] got merged, let's add an option to control resource calculation to workaround these issues. [1]: kubernetes/kubernetes#79247 Fixes: #1822 Signed-off-by: Li Wei <wei@hyper.sh>

liwei · 2019-06-24T08:30:08Z

PR updated, @bergwolf PTAL

grahamwhaley · 2019-06-24T09:09:20Z

Please ensure any/all documentation relating to any resource calculations is also updated - as that is one of the first places the user checks, or we send them for the answers.

Pennyzct · 2019-06-24T09:41:35Z

/test

codecov · 2019-06-24T10:22:01Z

Codecov Report

Merging #1823 into master will increase coverage by 2.02%.
The diff coverage is 53.84%.

@@            Coverage Diff             @@
##           master    #1823      +/-   ##
==========================================
+ Coverage    52.5%   54.52%   +2.02%     
==========================================
  Files         108      106       -2     
  Lines       13951    13059     -892     
==========================================
- Hits         7325     7121     -204     
+ Misses       5756     5090     -666     
+ Partials      870      848      -22

egernst · 2019-06-24T14:19:23Z

cli/config/configuration-fc.toml.in

@@ -78,6 +79,9 @@ default_memory = @DEFMEMSZ@
 # Default 0
 #memory_offset = 0

+# Specify whether cpu/memory used by guest is counted in container resource quota.
+#overhead_in_quota = false


Resource quota is a kubernetes admission controller -- is that what you're referring to in this name?

No, not really, naming is the hardest part, do you have some suggestion?

egernst · 2019-06-24T14:19:48Z

virtcontainers/hypervisor.go

@@ -171,6 +171,9 @@ type HypervisorConfig struct {
 	// DefaultMem specifies default memory size in MiB for the VM.
 	MemorySize uint32

+	// OverhaedInQuota specifies whether cpu/memory used by guest is counted in container quota.


s/haed/head

jcvenegas

Hey @liwei thanks for the PR, this looks good just a few nits but looks good

jcvenegas · 2019-06-24T15:03:54Z

cli/config/configuration-fc.toml.in

@@ -65,6 +65,7 @@ default_bridges = @DEFBRIDGES@
 # Default memory size in MiB for SB/VM.
 # If unspecified then it will be set @DEFMEMSZ@ MiB.
 default_memory = @DEFMEMSZ@
+


jcvenegas · 2019-06-24T15:07:15Z

cli/config/configuration-fc.toml.in

@@ -78,6 +79,9 @@ default_memory = @DEFMEMSZ@
 # Default 0
 #memory_offset = 0

+# Specify whether cpu/memory used by guest is counted in container resource quota.


Could you explain here when or how this should be enabled?

Per @grahamwhaley 's comment, I'm planning to update the document to explain this, WDYT?

Lets add in the documentation, I wanted to add a bit more of information but I probably add all the context in a one-line summary may be difficult.

jcvenegas · 2019-06-24T15:08:28Z

cli/config/configuration-qemu.toml.in

@@ -89,6 +89,9 @@ default_memory = @DEFMEMSZ@
 # Default 0
 #memory_offset = 0

+# Specify whether cpu/memory used by guest is counted in container resource quota.
+#overhead_in_quota = false


@jodh-intel we should really unify this into one config and customization ? Or something like that

Hi @jcvenegas - slightly confused - are you suggesting we have separate options for CPU and memory?

liwei · 2019-06-25T01:44:44Z

Please ensure any/all documentation relating to any resource calculations is also updated - as that is one of the first places the user checks, or we send them for the answers.

Hi @grahamwhaley , after some searching, I found that cpu-constraints.md may be a good place to explain this config option, I'm planning to rename that document to "resource-constraints" and explain both memory and CPU constraints in it, WDYT?

Do you know if there are other documents that need an update?

grahamwhaley · 2019-06-25T09:04:53Z

Hi @grahamwhaley , after some searching, I found that cpu-constraints.md may be a good place to explain this config option, I'm planning to rename that document to "resource-constraints" and explain both memory and CPU constraints in it, WDYT?

Do you know if there are other documents that need an update?

That looks like a good place to me. @jcvenegas worked in that area recently, so may have some useful input?

egernst

I’d rather gate this until docs PR is ready to be merge as well.

egernst · 2019-06-25T18:14:38Z

Thinking through this more...

I generally like the change, I just want to look at it holistically in the stack, since we are introducing "one more configuration" and not entirely fixing the issue.

Problem statements

System tracking, stability

Stability is most important, and anything we can do to address this is wise.

Having said that, the guest overheads for memory (and CPU, if there are any) are only a (small?) piece of the sandbox runtime overhead story. The host-side overhead of running the VMM, the vCPUS, the IOthreads, and any other 'runtime infra' are likely much larger. This change will just slightly improve things. There are still further gaps on how these are managed wrt cgroups today, and @jcvenegas is working to fix this (ie, today we don't place everything in cgroups for fear of being OOMd; we need this to be configurable so we can "do the right thing" and manage in pod/sandbox cgroup on host if the creater of the cgroup sizes it appropriately for sandboxed-runtime (ie, accounts for the overhead).

While it does improve stability slightly (this is most important), if enabled, it does result in unexpected behavior for the workload. If the workload requires all of the memory it requests, things will start getting OOMd, worse case.

User Confusion

If end users are confused, I'd argue this is an issue in our documentation/education, not in our code.

Names, and usage of the default hypervisor parameters

Early on I was skeptical of utilizing the hypervisor default values as an overhead, as opposed to a default value when no resource requirements are specified for the sandbox. In the end, today
we treat it as both a default value, as well as a guest-overhead. I think this is problematic, since appropriate guest artifact overhead (that is, overhead of running the kernel and system services in the guest, from a CPU and memory perspective) would be too small to run both a best-effort workload and minimal services.

Let's say, though, that we only have users who "do the right thing" -- this is, that we focus on burstable and guaranteed QoS. If that is the case, it would be appropriate to size the 'defaults' to match the intent you are showing here, which is overhead. We can make sure this happens, in a k8s context, by setting default resources for all pods.

Thinking through this, I wonder if it makes sense to have separate "default" for burstable pods(a'la containers with no resource requirements specified) and guest-overhead (we'd need to calculate default value for running just the kernel and agent).

So, before introducing the configuration in this PR, I'd like to consider clarifying, and perhaps separating out default handling and guest-overheads.

liwei · 2019-07-01T06:48:19Z

Hi @egernst,

I agreed that "the guest overheads are only a piece of the sandbox overhead", for the accuracy of resource limitation, placing everything in cgroup as @jcvenegas doing looks good to me. We should state the sandbox overhead in our document and let the user consider the overhead when setting up the container's resource limit to avoid OOM.

To the separation of default and overhead, I have no strong preference for it but do we need that complexity? It's hard for us to get a perfect solution to satisfy all, so, how about treating the resource setup in kata configuration as the default when no container resource limit is set and obey resource limit if that was set up by the user(set the resource limit on host side cgroup), if user's setting is too small to launch kata guest, just let it fail.

So, what do you think? @egernst @bergwolf @gnawux

egernst · 2019-08-01T04:46:39Z

(sorry for slow reply - I'm hoping we can have a PR in next several days to clarify all of this in documentation).

raravena80 · 2019-11-15T19:51:49Z

@liwei any updates on this PR? Thx!

raravena80 · 2019-12-20T17:40:02Z

@liwei any updates?

Your weekly Kata herder.

yichengliu58 · 2020-07-01T10:08:55Z

Hi @liwei
Any updates about this PR ?
I've seen there is a wiki explaining resource constraints https://github.com/kata-containers/documentation/wiki/UserGuide#memory but its content is too simple to understand the details. So are there any newly updated docs ? Thanks so much

This updates grpc-go vendor package to v1.11.3 release, to fix server.Stop() handling so that server.Serve() does not wait blindly. Full commit list: d11072e (tag: v1.11.3) Change version to 1.11.3 d06e756 clientconn: add support for unix network in DialContext. (kata-containers#1883) 452c2a7 Change version to 1.11.3-dev d89cded (tag: v1.11.2) Change version to 1.11.2 98ac976 server: add grpc.Method function for extracting method from context (kata-containers#1961) 0f5fa28 Change version to 1.11.2-dev 1e2570b (tag: v1.11.1) Change version to 1.11.1 d28faca client: Fix race when using both client-side default CallOptions and per-call CallOptions (kata-containers#1948) 48b7669 Change version to 1.11.1-dev afc05b9 (tag: v1.11.0) Change version to 1.11.0 f2620c3 resolver: keep full unparsed target string if scheme in parsed target is not registered (kata-containers#1943) 9d2250f status: rename Status to GRPCStatus to avoid name conflicts (kata-containers#1944) 2756956 status: Allow external packages to produce status-compatible errors (kata-containers#1927) 0ff1b76 routeguide: reimplement distance calculation dfbefc6 service reflection can lookup enum, enum val, oneof, and field symbols (kata-containers#1910) 32d9ffa Documentation: Fix broken link in rpc-errors.md (kata-containers#1935) d5126f9 Correct Go 1.6 support policy (kata-containers#1934) 5415d18 Add documentation and example of adding details to errors (kata-containers#1915) 57640c0 Allow storing alternate transport.ServerStream implementations in context (kata-containers#1904) 031ee13 Fix Test: Update the deadline since small deadlines are prone to flakes on Travis. (kata-containers#1932) 2249df6 gzip: Add ability to set compression level (kata-containers#1891) 8124abf credentials/alts: Remove the enable_untrusted_alts flag (kata-containers#1931) b96718f metadata: Fix bug where AppendToOutgoingContext could modify another context's metadata (kata-containers#1930) 738eb6b fix minor typos and remove grpc.Codec related code in TestInterceptorCanAccessCallOptions (kata-containers#1929) 211a7b7 credentials/alts: Update ALTS "New" APIs (kata-containers#1921) fa28bef client: export types implementing CallOptions for access by interceptors (kata-containers#1902) ec9275b travis: add Go 1.10 and run vet there instead of 1.9 (kata-containers#1913) 13975c0 stream: split per-attempt data from clientStream (kata-containers#1900) 2c2d834 stats: add BeginTime to stats.End (kata-containers#1907) 3a9e1ba Reset ping strike counter right before sending out data. (kata-containers#1905) 90dca43 resolver: always fall back to default resolver when target does not follow URI scheme (kata-containers#1889) 9aba044 server: Convert all non-status errors to codes.Unknown (kata-containers#1881) efcc755 credentials/alts: change ALTS protos to match the golden version (kata-containers#1908) 0843fd0 credentials/alts: fix infinite recursion bug [in custom error type] (kata-containers#1906) 207e276 Fix test race: Atomically access minConnecTimout in testing environment. (kata-containers#1897) 3ae2a61 interop: Add use_alts flag to client and server binaries (kata-containers#1896) 5190b06 ALTS: Simplify "New" APIs (kata-containers#1895) 7c5299d Fix flaky test: TestCloseConnectionWhenServerPrefaceNotReceived (kata-containers#1870) f0a1202 examples: Replace context.Background with context.WithTimeout (kata-containers#1877) a1de3b2 alts: Change ALTS proto package name (kata-containers#1886) 2e7e633 Add ALTS code (kata-containers#1865) 583a630 Expunge error codes that shouldn't be returned from library (kata-containers#1875) 2759199 Small spelling fixes (unknow -> unknown) (kata-containers#1868) 12da026 clientconn: fix a typo in GetMethodConfig documentation (kata-containers#1867) dfa1834 Change version to 1.11.0-dev (kata-containers#1863) 46fd263 benchmarks: add flag to benchmain to use bufconn instead of network (kata-containers#1837) 3926816 addrConn: Report underlying connection error in RPC error (kata-containers#1855) 445b728 Fix data race in TestServerGoAwayPendingRPC (kata-containers#1862) e014063 addrConn: keep retrying even on non-temporary errors (kata-containers#1856) 484b3eb transport: fix race causing flow control discrepancy when sending messages over server limit (kata-containers#1859) 6c48c7f interop test: Expect io.EOF from stream.Send() (kata-containers#1858) 08d6261 metadata: provide AppendToOutgoingContext interface (kata-containers#1794) d50734d Add status.Convert convenience function (kata-containers#1848) 365770f streams: Stop cleaning up after orphaned streams (kata-containers#1854) 7646b53 transport: support stats.Handler in serverHandlerTransport (kata-containers#1840) 104054a Fix connection drain error message (kata-containers#1844) d09ec43 Implement unary functionality using streams (kata-containers#1835) 37346e3 Revert "Add WithResolverUserOptions for custom resolver build options" (kata-containers#1839) 424e3e9 Stream: do not cancel ctx created with service config timeout (kata-containers#1838) f9628db Fix lint error and typo (kata-containers#1843) 0bd008f stats: Fix bug causing trailers-only responses to be reported as headers (kata-containers#1817) 5769e02 transport: remove unnecessary rstReceived (kata-containers#1834) 0848a09 transport: remove redundant check of stream state in Write (kata-containers#1833) c22018a client: send RST_STREAM on client-side errors to prevent server from blocking (kata-containers#1823) 82e9f61 Use keyed fields for struct initializers (kata-containers#1829) 5ba054b encoding: Introduce new method for registering and choosing codecs (kata-containers#1813) 4f7a2c7 compare atomic and mutex performance in case of contention. (kata-containers#1788) b71aced transport: Fix a data race when headers are received while the stream is being closed (kata-containers#1814) 46bef23 Write should fail when the stream was done but context wasn't cancelled. (kata-containers#1792) 10598f3 Explain target format in DialContext's documentation (kata-containers#1785) 08b7bd3 gzip: add Name const to avoid typos in usage (kata-containers#1804) 8b02d69 remove .please-update (kata-containers#1800) 1cd2346 Documentation: update broken wire.html link in metadata package. (kata-containers#1791) 6913ad5 Document that all errors from RPCs are status errors (kata-containers#1782) 8a8ac82 update const order (kata-containers#1770) e975017 Don't set reconnect parameters when the server has already responded. (kata-containers#1779) 7aea499 credentials: return Unavailable instead of Internal for per-RPC creds errors (kata-containers#1776) c998149 Avoid copying headers/trailers in unary RPCs unless requested by CallOptions (kata-containers#1775) 8246210 Update version to 1.10.0-dev (kata-containers#1777) 17c6e90 compare atomic and mutex performance for incrementing/storing one variable (kata-containers#1757) 65c901e Fix flakey test. (kata-containers#1771) 7f2472b grpclb: Remove duplicate init() (kata-containers#1764) 09fc336 server: fix bug preventing Serve from exiting when Listener is closed (kata-containers#1765) 035eb47 Fix TestGracefulStop flakiness (kata-containers#1767) 2720857 server: fix race between GracefulStop and new incoming connections (kata-containers#1745) 0547980 Notify parent ClientConn to re-resolve in grpclb (kata-containers#1699) e6549e6 Add dial option to set balancer (kata-containers#1697) 6610f9a Fix test: Data race while resetting global var. (kata-containers#1748) f4b5237 status: add Code convenience function (kata-containers#1754) 47bddd7 vet: run golint on _string files (kata-containers#1749) 45088c2 examples: fix concurrent map accesses in route_guide server (kata-containers#1752) 4e393e0 grpc: fix deprecation comments to conform to standard (kata-containers#1691) 0b24825 Adjust keepalive paramenters in the test such that scheduling delays don't cause false failures too often. (kata-containers#1730) f9390a7 fix typo (kata-containers#1746) 6ef45d3 fix stats flaky test (kata-containers#1740) 98b17f2 relocate check for shutdown in ac.tearDown() (kata-containers#1723) 5ff10c3 fix flaky TestPickfirstOneAddressRemoval (kata-containers#1731) 2625f03 bufconn: allow readers to receive data after writers close (kata-containers#1739) b0e0950 After sending second goaway close conn if idle. (kata-containers#1736) b8cf13e Make sure all goroutines have ended before restoring global vars. (kata-containers#1732) 4742c42 client: fix race between server response and stream context cancellation (kata-containers#1729) 8fba5fc In gracefull stop close server transport only after flushing status of the last stream. (kata-containers#1734) d1fc8fa Deflake tests that rely on Stop() then Dial() not reconnecting (kata-containers#1728) dba60db Switch balancer to grpclb when at least one address is grpclb address (kata-containers#1692) ca1b23b Update CONTRIBUTING.md to CNCF CLA 2941ee1 codes: Add UnmarshalJSON support to Code type (kata-containers#1720) ec61302 naming: Fix build constraints for go1.6 and go1.7 (kata-containers#1718) b8191e5 remove stringer and go generate (kata-containers#1715) ff1be3f Add WithResolverUserOptions for custom resolver build options (kata-containers#1711) 580defa Fix grpc basics link in route_guide example (kata-containers#1713) b7dc71e Optimize codes.String() method using a switch instead of a slice of indexes (kata-containers#1712) 1fc873d Disable ccBalancerWrapper when it is closed (kata-containers#1698) bf35f1b Refactor roundrobin to support custom picker (kata-containers#1707) 4308342 Change parseTimeout to not handle non-second durations (kata-containers#1706) be07790 make load balancing policy name string case-insensitive (kata-containers#1708) cd563b8 protoCodec: avoid buffer allocations if proto.Marshaler/Unmarshaler (kata-containers#1689) 61c6740 Add comments to ClientConn/SubConn interfaces to indicate new methods may be added (kata-containers#1680) ddbb27e client: backoff before reconnecting if an HTTP2 server preface was not received (kata-containers#1648) a4bf341 use the request context with net/http handler (kata-containers#1696) c6b4608 transport: fix race sending RPC status that could lead to a panic (kata-containers#1687) 00383af Fix misleading default resolver scheme comments (kata-containers#1703) a62701e Eliminate data race in ccBalancerWrapper (kata-containers#1688) 1e1a47f Re-resolve target when one connection becomes TransientFailure (kata-containers#1679) 2ef021f New grpclb implementation (kata-containers#1558) 10873b3 Fix panics on balancer and resolver updates (kata-containers#1684) 646f701 Change version to 1.9.0-dev (kata-containers#1682) Fixes: kata-containers#307 Signed-off-by: Peng Tao <bergwolf@gmail.com>

grahamwhaley requested review from jcvenegas and egernst June 24, 2019 09:08

egernst reviewed Jun 24, 2019

View reviewed changes

jcvenegas reviewed Jun 24, 2019

View reviewed changes

This was referenced Jun 25, 2019

rootfs: configure chronyc service with makestep kata-containers/osbuilder#318

Merged

ARM CI failing kata-containers/ci#169

Closed

jcvenegas approved these changes Jun 25, 2019

View reviewed changes

egernst reviewed Jun 25, 2019

View reviewed changes

egernst added do-not-merge PR has problems or depends on another enhancement Improvement to an existing feature labels Jun 25, 2019

liwei closed this Jan 3, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

config: add an option to control memory size calculation #1823

config: add an option to control memory size calculation #1823

liwei commented Jun 22, 2019

bergwolf commented Jun 22, 2019

bergwolf commented Jun 22, 2019

liwei commented Jun 24, 2019

liwei commented Jun 24, 2019

grahamwhaley commented Jun 24, 2019

Pennyzct commented Jun 24, 2019

codecov bot commented Jun 24, 2019 •

edited

Loading

egernst Jun 24, 2019

liwei Jun 25, 2019

egernst Jun 24, 2019

jcvenegas left a comment

jcvenegas Jun 24, 2019

jcvenegas Jun 24, 2019

liwei Jun 25, 2019

jcvenegas Jun 25, 2019

jcvenegas Jun 25, 2019

jcvenegas Jun 24, 2019

jodh-intel Jun 26, 2019

liwei commented Jun 25, 2019

grahamwhaley commented Jun 25, 2019

egernst left a comment

egernst commented Jun 25, 2019 •

edited

Loading

liwei commented Jul 1, 2019

egernst commented Aug 1, 2019

raravena80 commented Nov 15, 2019

raravena80 commented Dec 20, 2019

yichengliu58 commented Jul 1, 2020

config: add an option to control memory size calculation #1823

config: add an option to control memory size calculation #1823

Conversation

liwei commented Jun 22, 2019

bergwolf commented Jun 22, 2019

bergwolf commented Jun 22, 2019

liwei commented Jun 24, 2019

liwei commented Jun 24, 2019

grahamwhaley commented Jun 24, 2019

Pennyzct commented Jun 24, 2019

codecov bot commented Jun 24, 2019 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jcvenegas left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

liwei commented Jun 25, 2019

grahamwhaley commented Jun 25, 2019

egernst left a comment

Choose a reason for hiding this comment

egernst commented Jun 25, 2019 • edited Loading

Problem statements

System tracking, stability

User Confusion

Names, and usage of the default hypervisor parameters

liwei commented Jul 1, 2019

egernst commented Aug 1, 2019

raravena80 commented Nov 15, 2019

raravena80 commented Dec 20, 2019

yichengliu58 commented Jul 1, 2020

codecov bot commented Jun 24, 2019 •

edited

Loading

egernst commented Jun 25, 2019 •

edited

Loading