Dataplane part of `forwarder-vpp` leaks #1120

denis-tingaikin · 2024-05-28T13:33:49Z

forwarder-vpp_goroutineprofiles_20240527074108.tar.gz
forwarder-vpp_memprofiles_20240527074153.tar.gz

TODO

Find potential prpblem places
Create an issue for each problem
Estimate each problem
Provide a list of issues that can be delivered till 7 Jun
Provide fixes from previous point

denis-tingaikin · 2024-05-30T08:27:54Z

logs from fwd.tar.gz

Provided by @szvincze

NikitaSkrynnik · 2024-06-03T06:26:12Z

Add CloseOnError chain elements to forwarder and nsmgr - 3h
Test forwarder with the new elements - 3h
Change timeout in forwarder - 3h
Test forwarder with the new timeout - 3h
Check if all chain elements cleanup themselves on errors from next - 3h

NikitaSkrynnik · 2024-06-04T14:10:44Z

Current state

It looks like vpp reuses indices for interfaces. It somehow affects the deletion of tap interfaces on Closes. Therefore, NSM shows no created tap interfaces but vpp still have them

Next steps

Make manual management of interface indices - 3h
Test manual management of interface indices - 3h
Test without deleting files - 7h
Investigate what causes the problem: govpp or vpp - 12h
- Find a place where to add logs - 2h
- Add logs to vpp - 3h
- Try to reproduce the problem - 7h
Find out if govpp doesn't delete some intefaces - 3h
Test with Closes - 7h

Total: 35h
The estimation may be increased

denis-tingaikin · 2024-06-04T14:17:21Z

/cc @szvincze

NikitaSkrynnik · 2024-06-25T10:10:30Z

Current status

forwarder-vpp has two controlplane problems and one dataplane problem.

Controlplane problems:

timeout chain element doesn't call Close and doesn't delete vpp intefaces
Even when forwarder-vpp calls Close some vxlan intefaces are not deleted anyway
Issue: Controlplane part of forwarder-vpp leaks #1129

Dataplane problems:

When forwarder-vpp calls Close some tap interfaces are not deleted, just switched to state: down. After investigation I found out that the problem is inside vpp

elajkat · 2024-07-10T09:14:01Z

Hi, @NikitaSkrynnik :Could you please help our investigation with the Dataplane Prblems you mentioned above and point to VPP? I loaded VPP (v23.10-rc0 local build) API with vxlan tunnel creates, tap creates, ACL xConnect create in batch for ~24 hours and was not able to reproduce such issue. I tried to use the same go API calls that are used by the cmd-forwarder-vpp.
If you have any suggestion how to try to reproduce the issue, or what direction could be followed please don't hesitate to add it here, thanks in advance

NikitaSkrynnik · 2024-07-11T08:30:35Z

Hi, @elajkat! Unfortunately we didn't have enough time to reproduce the problem on VPP. You can try to reproduce it using NSM and check VPP which runs inside a forwader. Here are the steps:

Deploy a basic NSM setup
Start scaling clients and endpoints (should be repeated at least 10 times)
2.1. Scale the clients and endpoints up to 20 pods each
2.2. Wait for 40 seconds
2.3. Scale the clients and endpoints down to 0
2.4. Wait for 20 seconds
2.5. Scale to 20 again
2.6. ...

After repeated scaling up/down I can observe several tap interfaces with state:DOWN in VPP inside the forwarder. The forwarder issues a request for deletion through govpp, but interfaces are still there with state:DOWN

elajkat · 2024-07-12T16:10:09Z

Hi, with the suggested scale up - scale down process I got some hanging interfaces, but those are in up state, i.e.:

=== pod/forwarder-vpp-vpf55 ===
              Name               Idx    State  MTU (L3/IP4/IP6/MPLS)     Counter          Count     
host-eth0                         1      up     1500/1500/1500/1500 rx packets                671059
                                                                    rx bytes               335612875
                                                                    tx packets                 14067
                                                                    tx bytes                 1526406
                                                                    drops                     657122
                                                                    ip4                       342905
                                                                    ip6                           79
local0                            0     down          0/0/0/0       
tap25                             42     up     1446/1446/1446/1446 rx packets                    12
                                                                    rx bytes                     936
                                                                    drops                         12
                                                                    ip6                           12
vxlan_tunnel14                    41     up     1446/1446/1446/1446 rx packets                    55
                                                                    rx bytes                    3362
                                                                    tx packets                    50
                                                                    tx bytes                    4860
                                                                    drops                          5

Next week I check this in detail with VPP API trace and forwarder logs etc. and try reproduce this without forwarder with pure VPP and API calls.

edwarnicke · 2024-07-16T14:41:57Z

You may find the vpp api trace helpful for capturing the sequence of API calls that leads to the issue for handoff to VPP folks for reproduction.

rubasov · 2024-07-17T11:52:33Z

I also played around with a slightly modified reproduction. I was able to reliably reproduce the leftover tun/vxlan_tunnel interfaces. At this time I only focused on identifying the suggested vpp bug. However I found no indication in vpp's api trace, that we have a vpp bug here. Here's what I did and saw.

The environment:

kind with 4 workers (but the repro seems to work with 2 workers too)
spire single cluster
nsm basic
registry-k8s: 2 replicas
nsm kernel2ethernet2kernel, slightly modified: we have wrapped the alpine pod in a deployment, so we can scale it

The reproduction loop:

while not having a leftover tap and/or vxlan_tunnel interface
in each forwarder: start vpp api tracing with an empty buffer (so that at the end we shall see only the api trace of the last up-and-down scale)
scale up: nse-kernel, alpine
scale down (to 0 replicas): alpine, nse-kernel
after all scale operations completed:
in each forwarder: check for leftover tap and/or vxlan_tunnel interfaces
if we have some:
print affected pods and interfaces
grep vpp api trace dumps for relevant tap_delete messages
grep and count vpp api trace dumps for vxlan_tunnel create/delete messages and see if we have fewer deletes than creates

In the above repro steps it slightly bothered me (due to my weak kubernetes-fu) that we do not wait for the completion of a scale operation (other than sleeping for a hardcoded period) before we start a new scale operation. Therefore I replaced the sleeps with kubectl wait commands (and a final sleep just to make sure) as you can see below:

kubectl -n ns-kernel2ethernet2kernel scale deployment nse-kernel --replicas=5
kubectl -n ns-kernel2ethernet2kernel wait --timeout=1m --for=condition=Available=True deployment.apps/nse-kernel

kubectl -n ns-kernel2ethernet2kernel scale deployment alpine --replicas=10
kubectl -n ns-kernel2ethernet2kernel wait --timeout=1m --for=condition=Available=True deployment.apps/alpine

kubectl -n ns-kernel2ethernet2kernel scale deployment alpine --replicas=0
kubectl -n ns-kernel2ethernet2kernel wait --timeout=1m --for delete pod --selector=app=alpine

kubectl -n ns-kernel2ethernet2kernel scale deployment nse-kernel --replicas=0
kubectl -n ns-kernel2ethernet2kernel wait --timeout=1m --for delete pod --selector=app=nse-kernel

sleep 5

(note: Please correct me if you think these kubectl wait commands to be incorrect.)

With this method I could reliably produce leftover interfaces. Usually already in the 1st or 2nd iteration of the reproduction loop we have some. Always the same number of taps as vxlan_tunnels. Most of the time with the same suffix (like tap3 and vxlan_tunnel3), but sometimes with different suffixes (like tap0 and vxlan_tunnel2). These interfaces are always up.

Out of 20 runs leaving interfaces around, I have seen 20 in which the vpp api trace contains no indication that anybody ever tried to delete the leftover interfaces:

For tap interfaces I could grep for the tap_delete message with a concrete sw_if_index, e.g.:

$ kubectl exec forwarder-vpp-r2mz8 -n nsm-system -- vppctl api trace dump | egrep --no-group-separator -A1 tap_delete | egrep -w 'sw_if_index: 2'

For vxlan_tunnel interfaces unfortunately the vpp api trace contains only identifiers that are harder to pair with sw_if_index, therefore I only counted the number of vxlan_tunnel creates and deletes - looking for having fewer deletes than creates:

$ kubectl exec forwarder-vpp-2jfxd -n nsm-system -- vppctl api trace dump | agrep -d ^vl_api vxlan_add_del_tunnel | egrep is_add | sort -r | uniq -c

The difference between creates and deletes were the number of vxlan_tunnel interfaces left around in all 20 test runs.

In summary I could not confirm the presence of a vpp bug - or at least my reproduction method does not reproduce it.

By the way the fact that the leftover interfaces are always present in tap+vxlan_tunnel pairs also makes me think its cause is more likely to be in nsm than vpp.

If anybody is interested I'm happy to share the scipts I used. Let me know if you have other ideas to refine the reproduction steps.

Let me know what you think!

elajkat · 2024-07-17T13:35:37Z

Thanks @rubasov.
I actually used a similar setup like @rubasov and I am able to run the same with some well placed kubectl waits and an extra sleep 10 after scale up (alpine 40, nse 5) and after scale down (alpine 0, nse 0) and I have no leftover interfaces.

denis-tingaikin · 2024-07-19T09:22:09Z

Hello @rubasov, thank you for testing it.

We're planning to continue working on this issue soon. As I recall, the main issue is revealed only to leaked vpp infaces with state up. At this moment, it could be helpful if you shared the scripts, and then we could verify steps to reproduce.

UPD: meant state: DOWN

Modify the kernel2ethernet2kernel case: * Remove the antiaffinity, so we'll have k2e2k mixed with k2k. * Wrap the alpine pod in a deployment, so we can scale it. * Extend NSM_CIDR_PREFIX, so we have enough IPs to scale. Signed-off-by: Bence Romsics <bence.romsics@gmail.com>

rubasov · 2024-07-23T07:57:44Z

Hi @denis-tingaikin,

I made some modifications and now I believe I have a better reproduction - hopefully reproducing every bug symptom Nikita described. I'm attaching the scripts I used.

My earlier approach (as in my previous comment) always stopped after the first leftover interface was detected (no matter if it was up or down). However these interfaces were always up and after a few minutes they got deleted, therefore it's not likely they could contribute to a long term memory leak. The current approach does not stop at the first leftover interface, but continues scaling the nsc deployment up and down. Usually after many more iterations (up to a 100, sometimes 150) this approach produces leftover interfaces both up and down (many up, a few down), out of which a few remains indefinitely (the down interfaces always remain).

With this complex reproduction I'm far from having a proper understanding of what's happening, however the following factors seem to be relevant:

the first bug of leaving many up interfaces around
the later cleanup process of these interfaces overlapping with the load caused by the still continuously ongoing scaling
errors in the forwarder logs like (their frequency: in a handful of iterations out of 100):
vppapi AddNodeNext returned error: context canceled
vppapi TapCreateV3 returned error: context canceled
vppapi SwInterfaceTagAddDel returned error: context canceled
When these are present the corresponding vpp api trace does not show these calls and there's nothing relevant in vppctl show logging.

Regarding the scripts. Please read them before running.

First run:

./setup-1.sh ; ./setup-2.sh ; ./scale.sh 3

setup-1.sh creates a kind cluster. I used it without having other kind clusters around.
scale.sh's parameter is the iteration count. When no parameter is passed it continues until there's a leftover interface in down.
setup-2.sh depends on rubasov/deployments-k8s@54f2d2d

Later runs:

./teardown-2.sh ; ./setup-2.sh ; ./scale.sh | tee scale.sh.log

nsm-scale-iface-leftover.tar.gz

NikitaSkrynnik · 2024-07-23T10:57:54Z

Hi @rubasov, @elajkat. I collected the traces for the issue with state:DOWN interfaces. Also added a small description: traces.zip. It took five up/down scalings to get two tap interfaces with state: DOWN.

Here are the scripts I use: iface-leaks.zip. Just run ./full-setup.sh and then ./scale.sh

rubasov · 2024-08-13T07:38:48Z

Hi @denis-tingaikin, @NikitaSkrynnik,

While @ljkiraly is out of office we're trying to track the status of this interface leak. We have seen networkservicemesh/sdk#1650 merged. The pull request description says it should fix all interface leaks. However Nikita's comment in networkservicemesh/sdk#1649 reads to me as if the merged patch was only part of a planned fix. We also built a new forwarder with the sdk patch merged and tested it with the reproduction method we discussed earlier. Unfortunately we still see leftover interfaces.

Can you please help us understand what exactly the merged patch fixes? Did we maybe miss another patch we should have used for a full fix?

NikitaSkrynnik · 2024-08-13T08:57:08Z

Hi, @rubasov. We found out that we need two more patches to fix all interface leaks:

denis-tingaikin · 2024-08-13T09:00:40Z

@rubasov Many thanks for testing it. Currently, we also looking for other solutions, it would be perfect if you could add these two patches #1120 (comment) (note: requires SDK with commit 301631365421a9d916a5ab7224fa9fbd847320b2) and test it.

rubasov · 2024-08-15T14:37:12Z

Hi @NikitaSkrynnik, @denis-tingaikin,

I did some more testing and wanted to get back to you with the results:

I started with sdk commit 78bedfb4 (which already includes commit 30163136 "Add a timeout for Closes in begin.Server (#1650)").
Then cherry-picked all commits of networkservicemesh/sdk#1656 and networkservicemesh/sdk#1655 on top.

I had to resolve one merge conflict between https://github.com/networkservicemesh/sdk/pull/1656/files#diff-963d51540e41ec0a05680ad81a593e4195fb8092eceb003e3f53778ec502fb74R45 and https://github.com/networkservicemesh/sdk/pull/1655/files#diff-963d51540e41ec0a05680ad81a593e4195fb8092eceb003e3f53778ec502fb74R46.

Then also noticed that the s/closeTimeout/contextTimeout/ rename had to be applied here too:
https://github.com/networkservicemesh/sdk/pull/1655/files#diff-d03d72993c11636268d2d3264a65f9888845d7e1f052b40a1aa1c2c503b10814R91

Using this sdk I built a new forwarder and ran the usual reproduction steps. Unfortunately I still saw leftover interfaces as before.

However seeing these conflicts between PR 1656 and 1655 I'm wondering if I'm really testing with the same code as you are. It feels like I'm still missing something.

Let me know what you think.

NikitaSkrynnik · 2024-08-19T00:05:37Z

@rubasov Could you try this forwarder image: nikitaxored/cmd-forwarder-vpp:no-leaks? It contains all the patches. Could you also send script you use to reproduce the problem?

rubasov · 2024-08-21T15:10:38Z

Hi @NikitaSkrynnik,

Thanks for the image! I started the repro script (which is still basically the same as I posted earlier) using your forwarder image. It seems to be working. Earlier, without the fixes, I believe the maximum iteration count I ever saw needed for an interface left in down was between 160-170. With the fixes included my repro is at iteration 670. And still no interface left in down. So I believe this at least significantly reduced the frequency or hopefully fully fixed it.

I'm interested in how this image was built differently from my last attempt. If you have the time, please share!

Thanks,
Bence

NikitaSkrynnik · 2024-08-22T10:36:54Z

@rubasov, we've merged all fixes into sdk main two days ago, you can check them there. I just used them to build the forwarder image.

rubasov · 2024-09-02T11:04:36Z

Hi @NikitaSkrynnik,

I wanted to get back after we have done some longer test runs taking the main branch after all your fixes merged. We still occasionally see some leftover interfaces. But it usually takes 500-1000 nsc scale-up-down iterations (with the usual repro scripts) until the error happens once. I believe this is 1-1.5 magnitudes better than without the fixes. At this moment we hope this is good enough even for long-lived deployments. Of course if we learn anything new, we'll get back to you.

Thank you for your help!
Bence

denis-tingaikin · 2024-09-10T15:58:24Z

Statistics from rc2.

Interfaces:
2024-09-05 14:13:03 - Node: control-plane-control-seliics06748-n1 | number of ^tap interface: 0 | number of ^vxlan_tunnel interface: 0 |
2024-09-05 14:13:04 - Node: control-plane-control-seliics06749-n2 | number of ^tap interface: 0 | number of ^vxlan_tunnel interface: 0 |
2024-09-05 14:13:05 - Node: control-plane-control-seliics06750-n3 | number of ^tap interface: 0 | number of ^vxlan_tunnel interface: 0 |
2024-09-05 14:13:07 - Node: pool1-ceph-seliics06744-n7 | number of ^tap interface: 0 | number of ^vxlan_tunnel interface: 0 |
2024-09-05 14:13:08 - Node: pool1-ceph-seliics06745-n4 | number of ^tap interface: 0 | number of ^vxlan_tunnel interface: 0 |
2024-09-05 14:13:09 - Node: pool1-ceph-seliics06746-n5 | number of ^tap interface: 0 | number of ^vxlan_tunnel interface: 0 |
2024-09-05 14:13:10 - Node: pool1-ceph-seliics06747-n6 | number of ^tap interface: 3 | number of ^vxlan_tunnel interface: 0 |

NikitaSkrynnik · 2024-09-17T13:24:23Z

We plan to revert the begin chain element networkservicemesh/sdk#1668 when Netlink is released: vishvananda/netlink#1019

rubasov · 2024-10-04T13:54:39Z

Several weeks back while working with the interface leak problem I also wanted to understand the link between the interface leak and the memory leak. Yesterday talking to Laszlo and Szilard we realized I never shared the results of this. This has practically nothing to do with nsm itself, however it still could be useful in understanding how the interface leak casued the memory leak, so I summarize it here.

The first step was to realize that the main problem were the leftover tap interfaces. Knowing this I no longer needed the complicated and long running reproduction scripts we used in other parts of the analysis. Instead I could just create tap interfaces in bulk and observe the memory usage. I created a test vm (under libvirt/kvm) and installed vpp into it.

# install vpp 24.06 from https://packagecloud.io/fdio/2406
$ sudo apt install vpp vpp-plugin-core

# disable the service that comes from the package
$ sudo systemctl stop vpp
$ sudo systemctl disable vpp
$ sudo mkdir /var/log/vpp/

# do everything as root
$ sudo -s

# start vpp manually instead
$ i=0 ; pkill -f vpp$i ; vpp "unix { cli-listen /run/vpp/cli-vpp$i.sock cli-prompt vpp$i# log /var/log/vpp/vpp$i.log exec /home/vagrant/vpp$i.vpp } api-trace { on } api-segment { prefix vpp$i } statseg { socket-name /run/vpp/statseg-vpp$i.sock }"

# verify that vpp works
$ vppctl -s /run/vpp/cli-vpp$i.sock show version
vpp v24.06.0-2~gcd30ea1dd~b19 built by root on bdc0da124535 at 2024-07-31T17:57:13

# collect memory usage statistics while creating 100 tap interfaces
vpp_pid=$( pgrep vpp )
kill $vpp_pid
while pgrep vpp >/dev/null ; do sleep 1 ; done

i=0 ; vpp "unix { cli-listen /run/vpp/cli-vpp$i.sock cli-prompt vpp$i# log /var/log/vpp/vpp$i.log exec /home/vagrant/vpp$i.vpp } api-trace { on } api-segment { prefix vpp$i } statseg { socket-name /run/vpp/statseg-vpp$i.sock }"
sleep 1
vpp_pid=$( pgrep vpp )

(
echo 0 $( vppctl -s /run/vpp/cli-vpp$i.sock show memory main-heap | egrep -o "used: [0-9]+.[0-9]+." | cut -d ' ' -f2 ) $( ps -o rss=,vsz= $vpp_pid ) $( free -k | awk '/^Mem:/ { print $3 }' )
for j in $( seq 100 )
do
    echo >&2 $j
    vppctl -s /run/vpp/cli-vpp$i.sock create tap >/dev/null
    echo $j $( vppctl -s /run/vpp/cli-vpp$i.sock show memory main-heap | egrep -o "used: [0-9]+.[0-9]+." | cut -d ' ' -f2 ) $( ps -o rss=,vsz= $vpp_pid ) $( free -k | awk '/^Mem:/ { print $3 }' )
done
) > vpp-taps-mem-usage.log # fields in this file: number-of-taps vppctl-show-memory-main-heap-used ps-rss ps-vsz free-k-used

$ head -1 vpp-taps-mem-usage.log ; tail -1 vpp-taps-mem-usage.log
0 56.88M 95180 18012616 2842848
100 59.65M 98480 18012616 3512548

We can see that both the vpp self-reported memory usage and the kernel's report about the vpp process itself shows cc. 30 KiB / tap interface memory usage - which is negligibly low. At this rate we are never running out of today's typical container memory limits (usually hundreds of MiBs of free memory). However the overall used memory in this vm grows much faster. To show the typical values and the variance here are the results of 5 runs:

6697 KiB/tap
6763 KiB/tap
6840 KiB/tap
6897 KiB/tap
6880 KiB/tap

This is more than 200x times what the vpp process itself uses per tap interface. Where does this memory go? I believe this discrepancy was also observed in a k8s environment between the cgroups reported overall memory usage and the vpp process's memory usage. In this simpler environment it's already clear that the extra memory must be used by the linux kernel. Let's find some positive proof of that!

We'll look into /proc/slabinfo where the kernel tracks several kinds of internal allocation statistics (https://man7.org/linux/man-pages/man5/slabinfo.5.html).

# The same statistics collection as before with an additional column where we track the overall memory usage of the kmalloc-8k slab in KiBs.
vpp_pid=$( pgrep vpp )
kill $vpp_pid
while pgrep vpp >/dev/null ; do sleep 1 ; done

i=0 ; vpp "unix { cli-listen /run/vpp/cli-vpp$i.sock cli-prompt vpp$i# log /var/log/vpp/vpp$i.log exec /home/vagrant/vpp$i.vpp } api-trace { on } api-segment { prefix vpp$i } statseg { socket-name /run/vpp/statseg-vpp$i.sock }"
sleep 1
vpp_pid=$( pgrep vpp )

(
echo 0 $( vppctl -s /run/vpp/cli-vpp$i.sock show memory main-heap | egrep -o "used: [0-9]+.[0-9]+." | cut -d ' ' -f2 ) $( ps -o rss=,vsz= $vpp_pid ) $( free -k | awk '/^Mem:/ { print $3 }' ) $( cat /proc/slabinfo | egrep ^kmalloc-8k | awk '{ print $2 * $4 / 1024 }' )
for j in $( seq 100 )
do
    echo >&2 $j
    vppctl -s /run/vpp/cli-vpp$i.sock create tap >/dev/null
    echo $j $( vppctl -s /run/vpp/cli-vpp$i.sock show memory main-heap | egrep -o "used: [0-9]+.[0-9]+." | cut -d ' ' -f2 ) $( ps -o rss=,vsz= $vpp_pid ) $( free -k | awk '/^Mem:/ { print $3 }' ) $( cat /proc/slabinfo | egrep ^kmalloc-8k | awk '{ print $2 * $4 / 1024 }' )
done
) > vpp-taps-mem-usage.log # fields in this file: number-of-taps vppctl-show-memory-main-heap-used ps-rss ps-vsz free-k-used slabinfo-kmalloc-8k-total-k

$ head -1 vpp-taps-mem-usage.log ; tail -1 vpp-taps-mem-usage.log
0 56.88M 95324 18012616 2804360 1360
100 59.65M 98624 18012616 3482088 614208

$ bc -ql
(3482088-2804360)/100
6777.28000000000000000000
(614208-1360)/100
6128.48000000000000000000
(614208-1360)/(3482088-2804360)
.90426837905472401907

That means we already found 90% of the to-be-explained memory usage in the allocation statistics of the kmalloc-8k slabs. At this point I was convinced I had positive proof that the extra memory is used by the kernel and did not look into the changes of other slab types (like kmallocs of different units).

But why does a tap interface use this much memory? Probably because vpp creates multiqueue tap interfaces (I believe for performance reasons on multicore cpus) and all the buffers allocated are multiplied by the number of queues. The relevant code is here:

https://github.com/FDio/vpp/blob/6e8b350a0154caee6e9f037ae4938b83fb5dcf20/src/vnet/devices/tap/tap.c#L129-L737

I shortly looked into creating the same tap interface setup by the ip and tc tools. I started off like this:

modprobe sch_multiq
ip tuntap add dev tap$i mode tap multi_queue vnet_hdr
tc qdisc add dev tap$i root multiq

However now I believe that not all tunables can be set via these tools, so I could not create the exact same thing this way. If somebody is interested, the tap creation code could be extracted from vpp.

Please also note that this memory could appear as if used by vpp - because the tap interfaces are nonpersistent tap interfaces. Which means that if the vpp process dies then the linux kernel automatically destroys all tap interfaces requested by vpp. That's why I did not need to delete the taps in the above examples, I could just kill vpp instead.

NikitaSkrynnik · 2024-11-19T10:05:52Z

Seems like vpp memory consumption constantly grows. Here is the graph of main-heap after 160h of testing:

We decided to disable binary API tracing. A new test has been running for 10h already and we don't see any memory consumption growth in vpp

NikitaSkrynnik · 2024-11-26T10:28:15Z

We can use nitems parameter in vpp config to limit the number of entries of binary API traces. After we reach this limit vpp memory usage stops to increase. Here is the table of forwarder's memory consumption for 5 days:

test duration	forwarder-vpp-9x6f6 (vpp memory usage)	forwarder-vpp-zdfgr (vpp memory usage)
10m	385Mi (158.96M)	357Mi (157.83M)
160m	384Mi (159.39M)	359Mi (158.21M)
20h	386Mi (159.39M)	367Mi (158.45M)
26h	391Mi (159.39M)	367Mi (158.45M)
3d19h	389Mi (159.39M)	368Mi (158.45M)
4d2h	394Mi (159.39M)	367Mi (158.45M)
5d2h	388Mi (159.39M)	367Mi (158.45M)
7d2h	392Mi (159.39M)	367Mi (158.45M)
11d	390Mi (159.28M)	367Mi (158.45M)

elajkat · 2024-11-26T13:33:09Z

Just wanted to comment it here that we started the same measurements with limited nitems, so big +1 for it.
The default value for nitems is 256k, which value you used for your tests?

elajkat · 2024-11-26T15:23:30Z

In my local env I started a test with vpp 24.10, nitems=1000 (to be small enough to reach the saturation quick enough), my env is in kind, 2 forwarders.

2024-11-26 14:28:04.439240278+01:00
RX trace: used 197 of 1000 items, is enabled, has not wrapped
Thread 0 vpp_main
base 0x759c6ea01000, size 1g, locked, unmap-on-destroy, name 'main heap'
page stats: page-size 4K, total 262144, mapped 41107, not-mapped 221037
numa 0: 41107 pages, 160.57m bytes
total: 1023.99M, used: 156.67M, free: 867.33M, trimmable: 866.43M
RX trace: used 233 of 1000 items, is enabled, has not wrapped
Thread 0 vpp_main
base 0x73b549e01000, size 1g, locked, unmap-on-destroy, name 'main heap'
page stats: page-size 4K, total 262144, mapped 41107, not-mapped 221037
numa 0: 41107 pages, 160.57m bytes
total: 1023.99M, used: 156.72M, free: 867.28M, trimmable: 866.43M

.....

2024-11-26 16:21:11.986159040+01:00
RX trace: used 1000 of 1000 items, is enabled, has wrapped
Thread 0 vpp_main
base 0x759c6ea01000, size 1g, locked, unmap-on-destroy, name 'main heap'
page stats: page-size 4K, total 262144, mapped 41485, not-mapped 220659
numa 0: 41485 pages, 162.05m bytes
total: 1023.99M, used: 158.15M, free: 865.85M, trimmable: 864.96M
RX trace: used 1000 of 1000 items, is enabled, has wrapped
Thread 0 vpp_main
base 0x72245e201000, size 1g, locked, unmap-on-destroy, name 'main heap'
page stats: page-size 4K, total 262144, mapped 42016, not-mapped 220128
numa 0: 42016 pages, 164.13m bytes
total: 1023.99M, used: 159.64M, free: 864.36M, trimmable: 862.88M

Sorry I have no time to parse the raw outputs to something easier to consume, or to a chart.
I leave this test running for few more hours, but it is visible that the memory consumption is reached a level and stopped around that level.

NikitaSkrynnik · 2024-11-27T00:10:02Z

I used value 4096

elajkat · 2024-11-27T08:31:10Z

I believe that setting the default value of nitems to some fair, low number is perfect, the admins/operators can set anytime the value in vpp cli:
vpp-1# api trace on nitems 10000
vpp-1# api trace status
RX trace: used 0 of 10000 items, is enabled, has not wrapped

The value must be documented with some help for the admins how to change it.

NikitaSkrynnik · 2024-11-29T05:50:11Z

Hi, @szvincze! We don't see any memory leaks on our side. Could you tell about your test results? Is everything stable on your side? Can we close this issue?

elajkat · 2024-11-29T08:00:06Z

Hi @NikitaSkrynnik, we will have a weekend test that covers this issue also (with decreased nitems value for VPP), after that we will have results from our labs. Thanks

denis-tingaikin assigned NikitaSkrynnik May 28, 2024

denis-tingaikin added this to Release v1.13.1 May 28, 2024

denis-tingaikin moved this to In Progress in Release v1.13.1 May 28, 2024

denis-tingaikin moved this from In Progress to Todo in Release v1.13.1 May 28, 2024

NikitaSkrynnik moved this from Todo to In Progress in Release v1.13.1 May 30, 2024

denis-tingaikin moved this from In Progress to Moved to next release in Release v1.13.1 Jun 9, 2024

denis-tingaikin added this to Release v1.13.2 Jun 9, 2024

denis-tingaikin moved this to In Progress in Release v1.13.2 Jun 9, 2024

NikitaSkrynnik added this to Release v1.14.0 Jun 18, 2024

NikitaSkrynnik moved this to In Progress in Release v1.14.0 Jun 18, 2024

denis-tingaikin moved this from In Progress to Done in Release v1.14.0 Jun 26, 2024

NikitaSkrynnik changed the title ~~Investigate forwarder memory leaks~~ Dataplane part of forwarder-vpp leaks Jul 23, 2024

NikitaSkrynnik moved this from Done to In Progress in Release v1.14.0 Jul 23, 2024

Ex4amp1e moved this from In Progress to Done in Release v1.14.0 Jul 26, 2024

Ex4amp1e moved this from Done to Under review in Release v1.14.0 Jul 26, 2024

Ex4amp1e moved this from Under review to Blocked in Release v1.14.0 Jul 26, 2024

NikitaSkrynnik mentioned this issue Jul 28, 2024

Add a timeout for Closes in begin.Server networkservicemesh/sdk#1650

Merged

9 tasks

Ex4amp1e moved this from Blocked to In Progress in Release v1.14.0 Sep 2, 2024

Ex4amp1e moved this from In Progress to Todo in Release v1.14.0 Sep 3, 2024

Ex4amp1e moved this from Todo to In Progress in Release v1.14.0 Sep 4, 2024

NikitaSkrynnik moved this from In Progress to Blocked in Release v1.14.0 Sep 11, 2024

denis-tingaikin moved this from Blocked to Moved to next release in Release v1.14.0 Sep 24, 2024

denis-tingaikin added this to Release v1.14.1 and Release v1.15.0 Nov 18, 2024

denis-tingaikin moved this to In Progress in Release v1.15.0 Nov 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dataplane part of `forwarder-vpp` leaks #1120

Dataplane part of `forwarder-vpp` leaks #1120

denis-tingaikin commented May 28, 2024

denis-tingaikin commented May 30, 2024 •

edited

Loading

NikitaSkrynnik commented Jun 3, 2024 •

edited

Loading

NikitaSkrynnik commented Jun 4, 2024 •

edited

Loading

denis-tingaikin commented Jun 4, 2024

NikitaSkrynnik commented Jun 25, 2024

elajkat commented Jul 10, 2024

NikitaSkrynnik commented Jul 11, 2024

elajkat commented Jul 12, 2024 •

edited

Loading

edwarnicke commented Jul 16, 2024

rubasov commented Jul 17, 2024

elajkat commented Jul 17, 2024 •

edited

Loading

denis-tingaikin commented Jul 19, 2024 •

edited

Loading

rubasov commented Jul 23, 2024

NikitaSkrynnik commented Jul 23, 2024

rubasov commented Aug 13, 2024 •

edited

Loading

NikitaSkrynnik commented Aug 13, 2024

denis-tingaikin commented Aug 13, 2024

rubasov commented Aug 15, 2024

NikitaSkrynnik commented Aug 19, 2024

rubasov commented Aug 21, 2024

NikitaSkrynnik commented Aug 22, 2024

rubasov commented Sep 2, 2024

denis-tingaikin commented Sep 10, 2024

NikitaSkrynnik commented Sep 17, 2024 •

edited

Loading

rubasov commented Oct 4, 2024

NikitaSkrynnik commented Nov 19, 2024

NikitaSkrynnik commented Nov 26, 2024 •

edited

Loading

elajkat commented Nov 26, 2024

elajkat commented Nov 26, 2024 •

edited

Loading

NikitaSkrynnik commented Nov 27, 2024

elajkat commented Nov 27, 2024

NikitaSkrynnik commented Nov 29, 2024

elajkat commented Nov 29, 2024

Dataplane part of forwarder-vpp leaks #1120

Dataplane part of forwarder-vpp leaks #1120

Comments

denis-tingaikin commented May 28, 2024

TODO

denis-tingaikin commented May 30, 2024 • edited Loading

NikitaSkrynnik commented Jun 3, 2024 • edited Loading

NikitaSkrynnik commented Jun 4, 2024 • edited Loading

Current state

Next steps

denis-tingaikin commented Jun 4, 2024

NikitaSkrynnik commented Jun 25, 2024

Current status

elajkat commented Jul 10, 2024

NikitaSkrynnik commented Jul 11, 2024

elajkat commented Jul 12, 2024 • edited Loading

edwarnicke commented Jul 16, 2024

rubasov commented Jul 17, 2024

elajkat commented Jul 17, 2024 • edited Loading

denis-tingaikin commented Jul 19, 2024 • edited Loading

rubasov commented Jul 23, 2024

NikitaSkrynnik commented Jul 23, 2024

rubasov commented Aug 13, 2024 • edited Loading

NikitaSkrynnik commented Aug 13, 2024

denis-tingaikin commented Aug 13, 2024

rubasov commented Aug 15, 2024

NikitaSkrynnik commented Aug 19, 2024

rubasov commented Aug 21, 2024

NikitaSkrynnik commented Aug 22, 2024

rubasov commented Sep 2, 2024

denis-tingaikin commented Sep 10, 2024

NikitaSkrynnik commented Sep 17, 2024 • edited Loading

rubasov commented Oct 4, 2024

NikitaSkrynnik commented Nov 19, 2024

NikitaSkrynnik commented Nov 26, 2024 • edited Loading

elajkat commented Nov 26, 2024

elajkat commented Nov 26, 2024 • edited Loading

NikitaSkrynnik commented Nov 27, 2024

elajkat commented Nov 27, 2024

NikitaSkrynnik commented Nov 29, 2024

elajkat commented Nov 29, 2024

Dataplane part of `forwarder-vpp` leaks #1120

Dataplane part of `forwarder-vpp` leaks #1120

denis-tingaikin commented May 30, 2024 •

edited

Loading

NikitaSkrynnik commented Jun 3, 2024 •

edited

Loading

NikitaSkrynnik commented Jun 4, 2024 •

edited

Loading

elajkat commented Jul 12, 2024 •

edited

Loading

elajkat commented Jul 17, 2024 •

edited

Loading

denis-tingaikin commented Jul 19, 2024 •

edited

Loading

rubasov commented Aug 13, 2024 •

edited

Loading

NikitaSkrynnik commented Sep 17, 2024 •

edited

Loading

NikitaSkrynnik commented Nov 26, 2024 •

edited

Loading

elajkat commented Nov 26, 2024 •

edited

Loading