hypervisor/qemu: add memory hotplug support #470

bergwolf · 2018-07-05T09:16:17Z

As discussed in #303, I'm splitting the memory hotplug part from the vm factory PR to get more eyes on it. The govmm part of change is already merged so we just need to change qemu to call the APIs properly.

However, because this is just part of the vm factory PR, there is no actual caller for the internal memory device hotplug API yet. That said, I did test it roughly with following hack and saw qmp hotplug commands succeed:

diff --git a/virtcontainers/sandbox.go b/virtcontainers/sandbox.go
index 87f8eda..6bba236 100644
--- a/virtcontainers/sandbox.go
+++ b/virtcontainers/sandbox.go
@@ -947,6 +947,14 @@ func (s *Sandbox) startVM() error {

        s.Logger().Info("VM started")

+       // HACK: test memory hotplug
+       defer func() {
+               _, err := s.hypervisor.hotplugAddDevice(&memoryDevice{0, 128}, memoryDev)
+               if err != nil {
+                       s.Logger().WithError(err).Warn("bergwolf: memory hotplug failed")
+               }
+
+       }()
        // Once startVM is done, we want to guarantee
        // that the sandbox is manageable. For that we need
        // to start the sandbox inside the VM.

To include vm factory related commits. Full list: 54caf78 (mine/templating, templating) qmp: add hotplug memory e66a9b4 qemu: add appendMemoryKnobs helper 8aeca15 qmp: add migrate set arguments a03d496 qmp: add set migration capabilities 0ace417 qemu: allow to set migration incoming 723bc5f qemu: allow to create a stopped guest 283d7df qemu: add file backed memory device support Signed-off-by: Peng Tao <bergwolf@gmail.com>

I got following warning after upgrading dep tool: Warning: the following project(s) have [[constraint]] stanzas in Gopkg.toml: ✗ github.com/hashicorp/yamux However, these projects are not direct dependencies of the current project: they are not imported in any .go files, nor are they in the 'required' list in Gopkg.toml. Dep only applies [[constraint]] rules to direct dependencies, so these rules will have no effect. Either import/require packages from these projects so that they become direct dependencies, or convert each [[constraint]] to an [[override]] to enforce rules on these projects, if they happen to be transitive dependencies, So let's convert constraint to override over yamux. In the meanwhile, update the yamux vendor. Full commit list: 4c2fe0d (origin/b-consul-3040) Dont output keepalive error when the session is closed f21aae5 Make sure to drain the timer channel on defer, and a clarifying comment 601ccd8 Make receive window update logic a bit cleaner 02d320c Uses timer pool in sendNoWait, like in waitForSendErr cf433c5 window update unit test for partial read; benchmark large buffer ca8dfd0 improve memory utilization in receive buffer, fix flow control 683f491 Fix race around read and write deadlines in Stream (#52) 40b86b2 Add public session CloseChan method (#44) Note that commit 4c2fe0d might also help kata-containers/agent/issues/231. Signed-off-by: Peng Tao <bergwolf@gmail.com>

katacontainersbot · 2018-07-05T09:28:15Z

PSS Measurement:
Qemu: 160289 KB
Proxy: 4516 KB
Shim: 8826 KB

Memory inside container:
Total Memory: 2045968 KB
Free Memory: 2007220 KB

katacontainersbot · 2018-07-05T09:47:33Z

PSS Measurement:
Qemu: 160700 KB
Proxy: 4477 KB
Shim: 8898 KB

Memory inside container:
Total Memory: 2045968 KB
Free Memory: 2006228 KB

codecov · 2018-07-05T10:34:35Z

Codecov Report

Merging #470 into master will decrease coverage by 0.05%.
The diff coverage is 27.65%.

@@            Coverage Diff             @@
##           master     #470      +/-   ##
==========================================
- Coverage    64.1%   64.05%   -0.06%     
==========================================
  Files          87       87              
  Lines        8885     8925      +40     
==========================================
+ Hits         5696     5717      +21     
- Misses       2577     2594      +17     
- Partials      612      614       +2

Impacted Files	Coverage Δ
virtcontainers/hypervisor.go	`72.78% <ø> (ø)`	⬆️
virtcontainers/qemu.go	`22.84% <27.65%> (+2.4%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b2bec33...66a3e81. Read the comment docs.

jodh-intel · 2018-07-05T13:48:17Z

virtcontainers/qemu.go

 	}

-	hostMemMb := uint64(float64(hostMemKb / 1024))
+	return uint64(float64(hostMemKb / 1024)), nil


This doesn't look right to me - should this be float64(hostMemKb)? Or can we just drop the float64()?

Yes, indeed. I was following the old code but you are right that float64 is unnecessary here. I'll fix it. Thanks!

jodh-intel · 2018-07-05T13:49:45Z

virtcontainers/qemu.go

@@ -837,6 +852,61 @@ func (q *qemu) hotplugRemoveCPUs(amount uint32) (uint32, error) {
 	return amount, q.sandbox.storage.storeHypervisorState(q.sandbox.id, q.state)
 }

+func (q *qemu) hotplugMemory(slot uint32, memMB int, op operation) error {
+	if memMB <= 0 {
+		return errors.New("cannot hotplug zero or negative MB memory")


Could you make the memMB a uint64 to avoid having to add this check? Same question for the function below.

Sure. But I'm keeping the one in q.hotplugAddMemory since the govmm requires int and we need a sanity check better early than later.

bergwolf · 2018-07-05T15:39:59Z

@jodh-intel PR updated. PTAL.

sboeuf · 2018-07-05T15:44:03Z

@jcvenegas PTAL!

katacontainersbot · 2018-07-05T16:07:59Z

PSS Measurement:
Qemu: 164755 KB
Proxy: 4742 KB
Shim: 8790 KB

Memory inside container:
Total Memory: 2045968 KB
Free Memory: 2007268 KB

gnawux · 2018-07-05T19:48:55Z

lgtm

jcvenegas · 2018-07-05T21:10:51Z

virtcontainers/qemu.go

+	return q.hotplugAddMemory(slot, int(memMB))
+}
+
+func (q *qemu) hotplugAddMemory(slot uint32, amount int) error {


the qmp function takes an ID I wonder if for clarity would be better name it id or to if want to keep it refer that indeed we are adding an slot.

The idea is to convert the slot number to an ID, either we do it in this function, or we need to do it somewhere else.

I am fine with any case. I ask because maybe a reader can think that is it is slot number where the memory will be inserted in the VM. This maybe not true all the time, qemu inserts it into the first empty slot.

jcvenegas · 2018-07-05T21:16:47Z

virtcontainers/qemu.go

+	}
+
+	// calculate current memory
+	currentMemory := int(q.config.DefaultMemSz)


This configuration is the one that comes from kata-containers.toml if that is the case this may change if the user changes the value. In that case current memory wont be the same memory that we used to boot.

If that is the case we have a few options here:

When we start the VM save the memory that was used to start it.

We can query to qmp to get the current memory in the guest.

I think saving the amount of memory used when the VM was started is the best option here. Or we could simply save the current amount of memory plugged to the VM (but this is because we cannot unplug that I am thinking about this).

@jcvenegas

This configuration is the one that comes from kata-containers.toml if that is the case this may change if the user changes the value. In that case current memory wont be the same memory that we used to boot.

Sorry, I don't quit follow it. Do you mean the sandbox config we used to boot can change later when someone connects to the sandbox again? It doesn't sound like a solid solution. But I think we only use kata-containers.toml when creating the sandbox and later access to the sandbox all fetches the sandbox config from persistent metadata with the sandbox ID. Am I missing something?

@bergwolf yes, well I am considering a future case were someone want to update the memory in a container. And we want to increase the memory. In that case exist the case that the config file was modified.

Another case is when we create a container inside a Pod. In that case the pod may be created with one config file, but then may be updated with a different default memory, and then create a container inside a pod.

But agree for the current use case does not have any issue with that.

jcvenegas · 2018-07-05T22:56:59Z

@bergwolf I think this is the initial implementation to make VM templating work right?

Do we expect to add more changes to PR or you only want to add this by now?

jcvenegas · 2018-07-05T23:02:40Z

virtcontainers/qemu.go

+
+	// rule out extremely large parameter
+	if memMB > math.MaxInt32 {
+		return fmt.Errorf("cannot hotplug %d MB memory: size too large", memMB)


Nit: The config file for kata ask for memory in MiB we may want to follow the same metric.

OK, I'll fix the error message. The actual code does take memory in MiB.

sboeuf · 2018-07-05T22:44:44Z

vendor/github.com/docker/go-units/duration.go

+	"fmt"
+	"time"
+)
+


I am curious how re-vendoring govmm ended up adding those two files github.com/docker/go-units/duration.go and docker/go-units/ulimit.go

Indirect dependency? These were added by dep anyway.

sboeuf · 2018-07-05T23:18:56Z

virtcontainers/qemu.go

-	HotpluggedVCPUs []CPUDevice
-	UUID            string
+	HotpluggedVCPUs  []CPUDevice
+	HotpluggedMemory int


Why aren't you using a memoryDevice type instead here? Also, this would force this structure to be public.

Why would you want memoryDevice to be public? Are you suggesting adding sandbox.HotplugMemory API?

I did not change this because memoryDevice.slot is internal knowledge. Even if we have sandbox.HotplugMemory, we should not expose it. @sboeuf let me know if you think otherwise.

Sorry, what I meant is that we could have something like HotpluggedMemory []MemoryDevice the same way we have HotpluggedVCPUs []CPUDevice.

No, I don't think we need that. We only need to track how much memory has been hotplugged. Unlike CPU devices, we do not support memory hot unplug, nor need to unplug memory the same chunk as they were plugged in even if we support memory hot unplug in future.

Fair enough !

@bergwolf Sorry I don't quite understand why you put "slot" field in struct MemoryDevice rather than struct QemuState , as an input from user will not involve "slot" field. When we hotplug memory device, we use a qmp command like "object_add ... mem1 ...", in which "1" is memory slot number. In my opnion, "slot" field should be stored in QemuState(hypervisor.json) so that we could get memory slot number when we need to hotplug memory.

sboeuf · 2018-07-05T23:21:10Z

virtcontainers/qemu.go

@@ -837,6 +853,64 @@ func (q *qemu) hotplugRemoveCPUs(amount uint32) (uint32, error) {
 	return amount, q.sandbox.storage.storeHypervisorState(q.sandbox.id, q.state)
 }

+func (q *qemu) hotplugMemory(slot uint32, memMB uint64, op operation) error {


You could use memoryDevice instead of slot uint32, memMB uint64 parameters.

Sure, I'll change it.

sboeuf · 2018-07-05T23:36:04Z

virtcontainers/qemu.go

@@ -169,6 +174,7 @@ func (q *qemu) init(sandbox *Sandbox) error {
 		return err
 	}

+	q.vmConfig = sandbox.config.VMConfig


The memory coming through the Resources structure in sandbox.config.VMConfig is part of some legacy code that should be removed IMO. We need a unified way to provide the resources, and that's using HypervisorConfig that we already have through q.config.
It is a bit odd to have CPU and Memory resources provided through 2 different structures.

Yes, indeed it is weird but I'd expect the clean up to be handled in a separate PR.

Why don't you do that directly as part of this PR (separate commit of course). Do you think that's a lot of work ?

As discussed in the arch meeting, it will break ondisk metadata compatibility so I prefer to defer it to a standalone PR at least.

As discussed during Arch committee meeting, this rework will be done in a separate PR to prevent from API breakage.

sboeuf · 2018-07-05T23:44:16Z

virtcontainers/qemu.go

+	}
+
+	// calculate current memory
+	currentMemory := int(q.config.DefaultMemSz)


I think saving the amount of memory used when the VM was started is the best option here. Or we could simply save the current amount of memory plugged to the VM (but this is because we cannot unplug that I am thinking about this).

bergwolf · 2018-07-06T02:11:59Z

@jcvenegas

Do we expect to add more changes to PR or you only want to add this by now?

This is all I need for vm factory. Feel free to add more PR based on it to support memory hotplug in other use cases. I assume that you want to work on that, right?

So that we can add more memory to an existing guest. Fixes: kata-containers#469 Signed-off-by: Peng Tao <bergwolf@gmail.com>

bergwolf · 2018-07-09T07:32:55Z

@jcvenegas @sboeuf PR updated. PTAL.

katacontainersbot · 2018-07-09T07:39:11Z

PSS Measurement:
Qemu: 164587 KB
Proxy: 4566 KB
Shim: 9016 KB

Memory inside container:
Total Memory: 2045968 KB
Free Memory: 2006980 KB

…bump # Kata Containers 1.6.0-rc1

bergwolf added 2 commits July 5, 2018 15:25

bergwolf force-pushed the memory-hotplug branch from 2e85fe0 to 0844887 Compare July 5, 2018 09:35

bergwolf requested review from sboeuf and amshinde and removed request for sboeuf July 5, 2018 09:40

jodh-intel reviewed Jul 5, 2018

View reviewed changes

bergwolf force-pushed the memory-hotplug branch 3 times, most recently from 220f231 to 2e2a89b Compare July 5, 2018 15:35

sboeuf requested a review from jcvenegas July 5, 2018 15:43

jcvenegas reviewed Jul 5, 2018

View reviewed changes

amshinde approved these changes Jul 5, 2018

View reviewed changes

jcvenegas reviewed Jul 5, 2018

View reviewed changes

sboeuf suggested changes Jul 5, 2018

View reviewed changes

hypervisor/qemu: add memory hotplug support

66a3e81

So that we can add more memory to an existing guest. Fixes: kata-containers#469 Signed-off-by: Peng Tao <bergwolf@gmail.com>

bergwolf force-pushed the memory-hotplug branch from 2e2a89b to 66a3e81 Compare July 9, 2018 07:30

sboeuf approved these changes Jul 9, 2018

View reviewed changes

sboeuf merged commit a8952fb into kata-containers:master Jul 9, 2018

miaoyq mentioned this pull request Jul 31, 2018

How to limit the memory of pod? #400

Closed

miaoyq mentioned this pull request Aug 20, 2018

[WIP] k8s memory resource hotplug #580

Closed

egernst mentioned this pull request Aug 22, 2018

Stable 1.1.1 candidate #621

Merged

sboeuf added the feature New functionality label Sep 12, 2018

bergwolf deleted the memory-hotplug branch September 13, 2018 03:27

zklei pushed a commit to zklei/runtime that referenced this pull request Jun 13, 2019

Merge pull request kata-containers#470 from egernst/1.6.0-rc1-branch-…

a2037c0

…bump # Kata Containers 1.6.0-rc1

hypervisor/qemu: add memory hotplug support #470

hypervisor/qemu: add memory hotplug support #470

Conversation

bergwolf commented Jul 5, 2018

katacontainersbot commented Jul 5, 2018

katacontainersbot commented Jul 5, 2018

codecov bot commented Jul 5, 2018 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bergwolf commented Jul 5, 2018

sboeuf commented Jul 5, 2018

katacontainersbot commented Jul 5, 2018

gnawux commented Jul 5, 2018 • edited by amshinde Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jcvenegas commented Jul 5, 2018

jcvenegas Jul 5, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ghost Aug 22, 2018 • edited by ghost Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bergwolf commented Jul 6, 2018

bergwolf commented Jul 9, 2018

katacontainersbot commented Jul 9, 2018

codecov bot commented Jul 5, 2018 •

edited

Loading

gnawux commented Jul 5, 2018 •

edited by amshinde

Loading

jcvenegas Jul 5, 2018 •

edited

Loading

ghost Aug 22, 2018 •

edited by ghost

Loading