`deflate_on_oom` doesn't seem to work as expected/documented #4324

simonis · 2023-12-14T16:28:44Z

After reading the Ballooning documentation my understanding of the deflate_on_oom is that if the parameter is set to true the ballooning device will be deflated automatically if a process in the guest requires memory pages which can not be otherwise provided:

deflate_on_oom: if this is set to true and a guest process wants to allocate some memory which would make the guest enter an out-of-memory state, the kernel will take some pages from the balloon and give them to said process

However, if I run Firecracker with e.g. 2 vCPUs,1gb of memory and a balloon device of 900mb:

{
    "target_pages": 230400,
    "actual_pages": 230400,
    "target_mib": 900,
    "actual_mib": 900,
    "swap_in": 0,
    "swap_out": 0,
    "major_faults": 92,
    "minor_faults": 3103,
    "free_memory": 66572288,
    "total_memory": 84398080,
    "available_memory": 0,
    "disk_caches": 151552,
    "hugetlb_allocations": 0,
    "hugetlb_failures": 0
}

..and then try to start a Java process in the guest with -Xms800m -Xmx800m (i.e. with a heap size of 800mb) the Java process in the guest will hang, Firecracker will use ~200% CPU time but the actual size occupied by the ballooning device in the guest will not change and remain at 900mb:

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
2552550 xxxxxxxx  20   0 1063092  82992  82096 R  99,9   0,3   3:44.76 fc_vcpu 1
2552544 xxxxxxxx  20   0 1063092  82992  82096 R  90,9   0,3   3:12.17 firecracker
2552549 xxxxxxxx  20   0 1063092  82992  82096 S  25,0   0,3   0:43.47 fc_vcpu 0

Once I reset the target size of the ballooning device to 100mb, the Java process will become unblocked and start.

However, from the documentation of the deflate_on_oom option I would have expected that the guest kernel would deflate the ballooning device automatically, if deflate_on_oom=true?

If I run the same experiment with deflate_on_oom=false, I instantly get an out of memory error when I trying to start the Java process:

penJDK 64-Bit Server VM warning: INFO: os::commit_memory(0x00000000ce000000, 279576576, 0) failed; error='Not enough space' (errno=12)
#
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (mmap) failed to map 279576576 bytes for committing reserved memory.

which is what I would have expected.

Also, if I increase (i.e. inflate) the balloon to 900m again after I started the Java process, I start getting warnings from the ballooning driver (as documented):

[  282.580254] virtio_balloon virtio0: Out of puff! Can't get 1 pages

..but the CPU usage again goes up to almost ~200%. Is this expected? I mean, the warnings are OK, but I wouldn't expect that Firecracker will burn all its CPU shares while trying to inflate the balloon?

So to summarize, is the described behavior with deflate_on_oom=true a bug in the implementation or have I misunderstood the behavior of the ballooning device in the event of low memory in the guest?

PS: I've used the following kernel and FC versinons for the experiments:
Guest kernel: 5.19.8
Host kernel : 6.5.7 (Ubuntu 20.04)
Firecracker : 1.5.1 and 1.6.0-dev ( from today 036d9906)

The text was updated successfully, but these errors were encountered:

bchalios · 2023-12-18T11:27:13Z

Hi Volker,

Thanks for reporting this. We will take a look and reproduce it, but in the meantime I'd like to point out that this configuration:

Guest kernel: 5.19.8
Host kernel : 6.5.7 (Ubuntu 20.04)

is not supported. Would you be able to try and reproduce with a supported set of host/guest kernels? What we test with is guest x host = [4.14, 5.10] x [4.14, 5.10, 6.1] (guest 6.1 might work too.

bchalios · 2023-12-18T11:36:14Z

Also, to answer your question:

So to summarize, is the described behavior with deflate_on_oom=true a bug in the implementation or have I misunderstood the behavior of the ballooning device in the event of low memory in the guest?

this should work, and we have tests that indicate it does work, i.e. the balloon gets deflated, however we do not track the
CPU time consumed to achieve this.

pb8o · 2024-01-08T10:29:33Z

Hi @simonis, is the answer that @bchalios provided enough, does it resolve your issue, or is there anything else to investigate?

simonis · 2024-01-27T19:18:10Z

Sorry for the late answer @bchalios , @pb8o. I finally managed to run my experiments on a "supported" platform, but unfortunately the results are exactly the same.

Host: 6.1.72-96.166.amzn2023.x86_64
Guest: 6.1.74 (with the config from microvm-kernel-ci-x86_64-6.1.config plus CONFIG_IP_PNP=y)
Firecracker: v1.6.0 and v1.7.0-dev ( from today 49db07b3)

So to summarize the problem: when I start Firecracker with a large ballooning device and deflate_on_oom: true and then try to start a process in the guest which requires memory reserved by the balloon, the guest seems to hang and the Firecracker threads on the host will run at 100% CPU:

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                           
 298704 ec2-user  20   0 1058260  85544  84964 R  57.9   0.0   5:45.06 fc_vcpu 1                                         
 298703 ec2-user  20   0 1058260  85544  84964 R  56.6   0.0   5:42.44 fc_vcpu 0                                         
 298698 ec2-user  20   0 1058260  85544  84964 R  26.2   0.0   2:33.69 firecracker

The guest itself is not really dead-locked, just extremely slow. I can ssh into it and and see the following:

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                             
  139 root      20   0 3150612  36748     32 S 112.5   3.6  13:00.63 jshell                                              
   44 root      20   0       0      0      0 R 100.0   0.0  10:13.17 kswapd0

The Java process I've started (i.e. jshell) is starving because it doesn't get enough memory. But it doesn't run into a hard OOM like when I'm running with deflate_on_oom=false. kswapd0 is running at 100% within the guest.

Querying the balloon metrics from the guest shows that the balloon slightly deflates itself, but this happens extremely slowly. E.g. initially we have something like this:

{
    "target_pages": 230400,
    "actual_pages": 228864,
    "target_mib": 900,
    "actual_mib": 894,
    "swap_in": 0,
    "swap_out": 0,
    "major_faults": 7030729,
    "minor_faults": 14225986,
    "free_memory": 49930240,
    "total_memory": 1033064448,
    "available_memory": 0,
    "disk_caches": 655360,
    "hugetlb_allocations": 0,
    "hugetlb_failures": 0
}

And after about 45 minutes we get to:

{
    "target_pages": 230400,
    "actual_pages": 180992,
    "target_mib": 900,
    "actual_mib": 707,
    "swap_in": 0,
    "swap_out": 0,
    "major_faults": 18425420,
    "minor_faults": 37668071,
    "free_memory": 50409472,
    "total_memory": 1033064448,
    "available_memory": 0,
    "disk_caches": 806912,
    "hugetlb_allocations": 0,
    "hugetlb_failures": 0
}

If I wait about 60 minutes, jshell finally starts up and begins to be usable.

So ballooning is indeed "kind" of working, but not really practically usable. I would expect that the ballooning device deflates much more promptly in this case.

simonis · 2024-01-27T19:52:24Z

I did one more run to confirm the behavior and collect more numbers:

time	target_mib	actual_mib	free_memory	available_memory
17:05:44	900	900	63778816	0
17:24:26	900	893	50581504	0
17:36:34	900	879	51023872	0
17:44:46	900	870	54579200	0
17:55:46	900	835	67063808	0
18:06:14	900	797	50536448	0
18:13:16	900	637	55500800	0
jshell exit	900	637	321597440	251809792
18:25:12	900	637	338468864	270401536
18:41:25	900	637	338210816	270143488

As you can see, it takes more than an hour until jshell becomes responsive (somewhere between 18:06 and 18:13). It also looks like the deflation starts extremely slow but gets faster as time goes on.

The other interesting observation is that after jshell exits, the balloon size doesn't inflate again, although its actual size is way below its target size and there's plenty of free memory. I would have expected that the balloon will automatically and continuously inflate if its size is below the target size and free memory is available. But nothing happens, there's no CPU usage in the Firecracker threads, neither in the host nor in the guest.

PS: these results were collected on a c5.metal instance.

JackThomson2 · 2024-11-12T12:09:37Z

Hi @simonis, I've been taking a look at this.

The deflate on oom is indeed a slow process, this is managed by the balloon driver in the guest kernel not by Firecracker itself. It seem to try release as little memory as possible on deflate.

The balloon will indeed not re-inflate if it is deflated on oom if it has reached it's target size, this appears to by design in the driver. However, if the balloon not yet reached it's target size it will continue trying to inflate even after being deflated.

The high CPU usage while trying to reach it's target size again appears to be by design, the driver will aggressively try to allocate memory to reach it's target size.

Overall, it appears the driver is intending the balloon to be inflated for shorter periods of time to free up memory before an operation such as VM migration etc.

Hope this helps

roypat added the Status: Awaiting author Indicates that an issue or pull request requires author action label Dec 20, 2023

zulinx86 added Good first issue Indicates a good issue for first-time contributors and removed Status: Awaiting author Indicates that an issue or pull request requires author action labels Mar 11, 2024

bchalios added Type: Bug Indicates an unexpected problem or unintended behavior and removed Good first issue Indicates a good issue for first-time contributors labels Apr 12, 2024

roypat added the Status: Parked Indicates that an issues or pull request will be revisited later label Aug 12, 2024

roypat assigned JackThomson2 Nov 13, 2024

roypat added Status: Awaiting author Indicates that an issue or pull request requires author action and removed Status: Parked Indicates that an issues or pull request will be revisited later labels Nov 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`deflate_on_oom` doesn't seem to work as expected/documented #4324

`deflate_on_oom` doesn't seem to work as expected/documented #4324

simonis commented Dec 14, 2023

bchalios commented Dec 18, 2023

bchalios commented Dec 18, 2023

pb8o commented Jan 8, 2024

simonis commented Jan 27, 2024

simonis commented Jan 27, 2024

JackThomson2 commented Nov 12, 2024

deflate_on_oom doesn't seem to work as expected/documented #4324

deflate_on_oom doesn't seem to work as expected/documented #4324

Comments

simonis commented Dec 14, 2023

bchalios commented Dec 18, 2023

bchalios commented Dec 18, 2023

pb8o commented Jan 8, 2024

simonis commented Jan 27, 2024

simonis commented Jan 27, 2024

JackThomson2 commented Nov 12, 2024

`deflate_on_oom` doesn't seem to work as expected/documented #4324

`deflate_on_oom` doesn't seem to work as expected/documented #4324