Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The example application DpdkTrafficFilter blocks forever #1275

Open
siphonelee opened this issue Dec 31, 2023 · 14 comments
Open

The example application DpdkTrafficFilter blocks forever #1275

siphonelee opened this issue Dec 31, 2023 · 14 comments
Labels

Comments

@siphonelee
Copy link

siphonelee commented Dec 31, 2023

First of all, thanks for the great work of DpdkTrafficFilter! It saves me a lot of effort in developing a high performance application using dpdk.

I'm running the example DpdkTrafficFilter application to filter DNS request like this:
./DpdkTrafficFilter -d 0 -s 1 -P 53 -r 1 -t 1 -c 3 -m 65535 -i (a certain IP address)

it outputs info like this:

EAL: Detected CPU lcores: 32
EAL: Detected NUMA nodes: 2
EAL: Detected shared linkage of DPDK
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: Selected IOVA mode 'PA'
EAL: No available 2048 kB hugepages reported
EAL: VFIO support initialized
EAL: Probe PCI driver: net_e1000_igb (8086:1521) device: 0000:01:00.0 (socket 0)
EAL: Probe PCI driver: net_e1000_igb (8086:1521) device: 0000:01:00.3 (socket 0)
TELEMETRY: No legacy callbacks, legacy socket not created
Opened device #0 with 1 RX queues and 1 TX queues. RSS hash functions:
RSS_IPV4
RSS_IPV6
Using core 1
Core configuration:
DPDK device#0: RX-Queue#0;

and a lot of error info which seems to be related to invalid DNS packet format (yet harmless I assume):
[ERROR: ...Packet++/src/DnsLayer.cpp: parseResources:156] DNS layer contains more than 300 resources, probably a bad packet. Skipping parsing DNS resources

I did see some packets are sent out from port 1 by packet capturing. But after a quick while, the application seems blocked - no more packets sent out from port 1, and no more error messages printed out like above.

If I comment out the packet-sending code in DpdkTrafficFilter, rebuild and run the application again, everything seems working fine:

if (packetMatched)
{
        // send packet to TX port if needed
        if (sendPacketsTo != NULL)
        {
                // comment it out !!!
                // sendPacketsTo->sendPacket(*packetArr[i], 0);
        }

        // save packet to file if needed
        if (pcapWriter != NULL)
        {
                pcapWriter->writePacket(*packetArr[i]);
        }

        m_Stats.MatchedPackets++;
}

The issue happens on both Pcap++ v22.11 and v23.09.

Could anyone please provide some insights? Thanks!

Environment:
Ubuntu 22.04
DPDK stable 21.11
igb_uio kernel module
NIC intel i350
1GB x 16 huge pages

BTW, I can run dpdk example application l2fwd to send packets without any issue, so I assume dpdk is ok.

@siphonelee
Copy link
Author

To add: after "blocking", in the following code dev->receivePackets() always return 0:

// go over all DPDK devices configured for this worker/core
for (InputDataConfig::iterator iter = m_WorkerConfig.InDataCfg.begin(); iter != m_WorkerConfig.InDataCfg.end(); iter++)
{
        // for each DPDK device go over all RX queues configured for this worker/core
        for (std::vector<int>::iterator iter2 = iter->second.begin(); iter2 != iter->second.end(); iter2++)
        {
                pcpp::DpdkDevice* dev = iter->first;

                // receive packets from network on the specified DPDK device and RX queue
                uint16_t packetsReceived = dev->receivePackets(packetArr, MAX_RECEIVE_BURST, *iter2);
                ......
}

while it's a busy communication line and packets are always in.

Just curious why sendPacketsTo->sendPacket() would cause dev->receivePackets() to behave incorrectly - they are different ports (-d 0 -s 1).

@seladb
Copy link
Owner

seladb commented Jan 2, 2024

@siphonelee it's hard to say why sending packets blocks all traffic (incoming and outgoing), you probably need to debug it further and provide more information... maybe the type of packets being sent outside causes the network to stop sending packets to this machine? Maybe you can try loading a packet from a pcap file that is not related to the incoming packets, then send this one packet any time a packet is being received and see if traffic still gets blocked at some point?

If you don't send any packets, does incoming traffic still gets blocked?

@siphonelee
Copy link
Author

@seladb Thank you for the reply.

If no packet was sent out, incoming traffic would not be blocked. Even in sending packet scenario, the first few ones can be sent out successfully - sendPacket() returns 1 for each packet sent, and the sent-out packet can be captured in wire. But after that, dev->receivePackets() starts to return 0.

@siphonelee
Copy link
Author

@seladb As you suggested, I tried to send a same packet each time the incoming packet matched with a certain IP address. The result was the same: RX got blocked after several TX packets.

Another observation was: I added code to call rte_eth_stats_get() and print out stats when program ends. When blocking happened, I could see a large rx_nombuf statistics; while if I quit the program earlier before blocking happened, the rx_nombuf statistics would always be 0. That seemed to imply a real "blocking".

@seladb
Copy link
Owner

seladb commented Jan 3, 2024

Thanks for debugging it @siphonelee ! The rx_nombuf stat may indicate that for some reason mBufs aren't being freed...

Here are a few things you can check that might help us debug the issue:

  • In the example app, there is a call to sendPacketsTo->sendPacket() in AppWorkerThread.h. Can you check the return value of it? Does it ever return false?
  • Can you add the following line just before sendPacketsTo->sendPacket() and let me know what's the output?
    std::cout << *packetArr[i].getObjectType() << std::endl;
    sendPacketsTo->sendPacket(...);
  • Do you remember if there is a way to get the number of free mbufs from DPDK? If yes, you can add this check during every receive/send iteration and check if the number of mbufs is constantly decreasing. I'd expect it to stay stable because mbufs should be allocated and free all the time, but maybe there's a bug somewhere and mbufs aren't being freed

@siphonelee
Copy link
Author

siphonelee commented Jan 4, 2024

@seladb Here are the debugging results per your request:

Thanks for debugging it @siphonelee ! The rx_nombuf stat may indicate that for some reason mBufs aren't being freed...

Here are a few things you can check that might help us debug the issue:

  • In the example app, there is a call to sendPacketsTo->sendPacket() in AppWorkerThread.h. Can you check the return value of it? Does it ever return false?

--- The return value is always true before blocking.

  • Can you add the following line just before sendPacketsTo->sendPacket() and let me know what's the output?
    std::cout << *packetArr[i].getObjectType() << std::endl;
    sendPacketsTo->sendPacket(...);

--- The object type is always 1 before blocking.

  • Do you remember if there is a way to get the number of free mbufs from DPDK? If yes, you can add this check during every receive/send iteration and check if the number of mbufs is constantly decreasing. I'd expect it to stay stable because mbufs should be allocated and free all the time, but maybe there's a bug somewhere and mbufs aren't being freed

--- Just as you figured out, there seems to be a mbuf leakage hidding. I used:

std::cout << "free: " << dev->getAmountOfFreeMbufs() << ", in use: " << dev->getAmountOfMbufsInUse() << std::endl;
// comment it in/out
bool b = sendPacketsTo->sendPacket(*packetArr[i], 0);

for the two scenarios:

  1. if I did call sendPacket(), free mbuf count would decrease quickly:
    free: 1047424, in use: 1151
    free: 962270, in use: 86305
    free: 945845, in use: 102730
    free: 847836, in use: 200739
    free: 543264, in use: 505311
    free: 401116, in use: 647459
    ..... Logging stopped and seemed blocking. Huge rx_nombuf observed if application quited here .....

  2. if I commented out sendPacket() and only printed MBuf statistics (wih RX continued), free mbuf count would be quite steady:
    free: 1047424, in use: 1151
    free: 1047424, in use: 1151
    free: 1047424, in use: 1151
    free: 1047424, in use: 1151
    free: 1047424, in use: 1151
    free: 1047424, in use: 1151
    free: 1047424, in use: 1151
    free: 1047424, in use: 1151
    free: 1047424, in use: 1151
    free: 1047424, in use: 1151
    .... lasting forever ....

@seladb
Copy link
Owner

seladb commented Jan 4, 2024

I think you uncovered a bug that has been there for a long time 😄

I think it was introduced in this commit (almost 6 years ago): 29a4db4

For some reason I assumed that calling rte_eth_tx_burst() frees the mbuf it uses, hence we can mark the mbuf as "no need to free". However I don't think this is the case (also according to DPDK documentation) and that might be the reason for the mbuf leak.

Here is the line where it happens:

bool needToFreeMbuf = (!useTxBuffer && !packetSent);

I didn't test this theory, and I also don't remember why I added this logic to avoid freeing the mbuf in this case, maybe there was a reason I don't remember 🤔

A simple fix can be removing this logic and seeing if the mbuf leak is gone. However I need to run more test to try and figure out why I added it in the first place...

@seladb seladb added the bug label Jan 4, 2024
@siphonelee
Copy link
Author

@seladb Thanks for you quick reply, that's exactly the change I made and tested for a while. It works fine now.

I'm sure you had a valid reason for the code, since I found the similar logic in several other places:

for (int index = 0; index < applyForMBufs; index++)
rawPacketsArr[index]->setFreeMbuf(needToFreeMbuf);
for (int index = applyForMBufs; index < arrLength; index++)
rawPacketsArr[index]->setFreeMbuf(!needToFreeMbuf);

bool needToFreeMbuf = (!useTxBuffer && (packetsSent != arrLength));
for (int index = 0; index < arrLength; index++)
mBufRawPacketArr[index]->setFreeMbuf(needToFreeMbuf);

bool needToFreeMbuf = (!useTxBuffer && (packetsSent != vecSize));
for (size_t index = 0; index < rawPacketsVec.size(); index++)
mBufRawPacketArr[index]->setFreeMbuf(needToFreeMbuf);

bool needToFreeMbuf = (!useTxBuffer && (packetsSent != vecSize));
for (size_t index = 0; index < vecSize; index++)
rawPacketsVec.at(index)->setFreeMbuf(needToFreeMbuf);

bool packetSent = (sendPacketsInner(txQueueId, &mbufRawPacket, getNextPacketFromMBufRawPacket, 1, useTxBuffer) == 1);
bool needToFreeMbuf = (!useTxBuffer && !packetSent);
mbufRawPacket.setFreeMbuf(needToFreeMbuf);

bool needToFreeMbuf = (!useTxBuffer && !packetSent);
rawPacket.setFreeMbuf(needToFreeMbuf);


According to DPDK document you referenced:
"It is the responsibility of the rte_eth_tx_burst() function to transparently free the memory buffers of packets previously sent. This feature is driven by the tx_free_thresh value supplied to the rte_eth_dev_configure() function at device configuration time. When the number of free Tx descriptors drops below this threshold, the rte_eth_tx_burst() function must [attempt to] free the rte_mbuf buffers of those packets whose transmission was effectively completed." Looks like the frequence of calling rte_eth_tx_burst matters. So I tried receive-then-send each packet without changing the logic (needToFreeMbuf still be false), and it worked fine.

In my previous use case I sent packets only when certain conditions were meet, during the interval of which a lot of packets had been received since it's a busy line. From the observations above my guess is the non-freeed (few) tx mbuf somehow blocks allocation of rx mbuf. But I can't figure out why. Please share it if you have an idea, thanks.

@seladb
Copy link
Owner

seladb commented Jan 6, 2024

Thanks for looking into it @siphonelee ! If I understand DPDK documentation correctly, rte_eth_tx_burst() doesn't automatically free mbufs, however it does try to when the number of free tx descriptors drops below threshold. I'm not sure if mbuf == tx descriptors, do you know?

I wonder if we always free the mbuf manually after successful TX (unlike what the logic currently does), will everything work as expected? Meaning - in both cases where (1) packets are immediately sent after receiving, or (2) packets are sent only when certain conditions are met - can DPDK handle freeing an already free mbuf - or will it fail? Can you please check this?

If it doesn't cause issues, maybe this is the change we need to make. Please let me know what you think

@siphonelee
Copy link
Author

Thanks for looking into it @siphonelee ! If I understand DPDK documentation correctly, rte_eth_tx_burst() doesn't automatically free mbufs, however it does try to when the number of free tx descriptors drops below threshold. I'm not sure if mbuf == tx descriptors, do you know?

--- I'm not an expert of DPDK, but here are my five cents: tx descriptors are used by nic in a ring-buffered way, and they refer to mbufs which contain actual packet data and metadata. Take i40e as an example, the related driver code I found is:

https://github.com/DPDK/dpdk/blob/6ef07151aac4b4d9601d547f94a996d1c71b3871/drivers/net/i40e/i40e_rxtx_vec_neon.c#L688-L689

https://github.com/DPDK/dpdk/blob/6ef07151aac4b4d9601d547f94a996d1c71b3871/drivers/net/i40e/i40e_rxtx.c#L1349-L1350

I wonder if we always free the mbuf manually after successful TX (unlike what the logic currently does), will everything work as expected? Meaning - in both cases where (1) packets are immediately sent after receiving, or (2) packets are sent only when certain conditions are met - can DPDK handle freeing an already free mbuf - or will it fail? Can you please check this?

--- Always freeing mbufs manually seems work as expected, and I'll test further. I guess the auto-free-mbuf mechanism of DPDK would be more performant since it's done in a bulk style, but may cause the blocking issue I met.

If it doesn't cause issues, maybe this is the change we need to make. Please let me know what you think

-- I agree.

@seladb
Copy link
Owner

seladb commented Jan 7, 2024

That sounds good @siphonelee , will you consider opening a PR with the fix? You already have a setup where you can test it and I currently don't...

@siphonelee
Copy link
Author

@seladb Sure. I just need more time for more testing.

@seladb
Copy link
Owner

seladb commented Jan 30, 2024

@siphonelee do you think you'll have some time to work on it soon?

It'd be great if we could fix it, and you have an environment to test it (which I currently don't...)

@siphonelee
Copy link
Author

@seladb Apologies for getting back to you late. I've been testing my fix recently in production environment, and there still exists a segment fault issue occuring occasionally and hard to reproduce:

if (m_MBuf != NULL && m_FreeMbuf)
rte_pktmbuf_free(m_MBuf);

What GDB reports is:

Thread 6 "lcore-worker-2" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffedd82640 (LWP 43952)]
0x00005555555e9a65 in rte_pktmbuf_free (m=0x395543a90000) at /usr/local/include/rte_mbuf.h:1383
1383 m_next = m->next;
(gdb) bt
#0 0x00005555555e9a65 in rte_pktmbuf_free (m=0x395543a90000) at /usr/local/include/rte_mbuf.h:1383
#1 0x00005555555ebded in pcpp::MBufRawPacket::setMBuf (this=this@entry=0x7ffed5f589a0, mBuf=mBuf@entry=0x1ac032000, timestamp=...) at src/MBufRawPacket.cpp:319
#2 0x00005555555e0c27 in pcpp::DpdkDevice::receivePackets (this=, rawPacketsArr=rawPacketsArr@entry=0x7fffedd80600, rawPacketArrLength=rawPacketArrLength@entry=128, rxQueueId=) at src/DpdkDevice.cpp:857
#3 0x00005555555cc134 in AppWorkerThread::run (this=0x555556052460, coreId=) at /var/local/PcapPlusPlus-22.11/Examples/DpdkExample-FilterTraffic/AppWorkerThread.h:121
#4 0x00005555555e56d8 in pcpp::DpdkDeviceList::dpdkWorkerThreadStart (ptr=0x555556052460) at src/DpdkDeviceList.cpp:319
#5 0x00007ffff79425d2 in eal_thread_loop[cold] () from /usr/local/lib/x86_64-linux-gnu/librte_eal.so.22
#6 0x00007ffff7465ac3 in start_thread (arg=) at ./nptl/pthread_create.c:442
#7 0x00007ffff74f7850 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
(gdb) print m
$1 = (rte_mbuf *) 0x395543a90000
(gdb) print *m
Cannot access memory at address 0x395543a90000

My fix is simply changing:
bool needToFreeMbuf = (!useTxBuffer && !packetSent);
to:
bool needToFreeMbuf = !useTxBuffer;

Your advice is appreciated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants