Packet drop in presence of load on unrelated port #89

HKalbasi · 2024-12-14T18:24:20Z

I have a intel 810 card with two 100Gbit port. When one of the ports is under ~80Gbit/s load, everything is ok, but when I make the unrelated port under the load as well, without changing retina config to include it, I see a huge 100kpkt/s drop. I thought there is a problem with dpdk and/or hardware, but it is not that simple since another dpdk based app (vpp) is able to handle that traffic without loss.

How I can troubleshoot this problem?

tbarbette · 2024-12-15T08:36:00Z

Are you sure about the VPP measurement? Intel 810 are notoriously not able to sustain the traffic on both ports. I even heard some vendors classify the second port as "active backup". Retina might be heavier on the hardware because it uses huge rings to avoid packet losses even when a few callback execute. Except from that I don't see why it would be more sensible than another software.

HKalbasi · 2024-12-15T21:56:22Z

Are you sure about the VPP measurement?

VPP report its drops in rx-miss field of the interface stats, and it is almost constant. I have previously seen that it would go up (when I filled the port with ~95Gbit/s) so I believe this time it has no drops. But I will test with dpdk-testpmd as well to make sure.

Intel 810 are notoriously not able to sustain the traffic on both ports. I even heard some vendors classify the second port as "active backup"

Do you have any links/references supporting this? And in that case, what NIC would you suggest? I believe my server's resources can handle retina with 200Gbit/s data. Would adding two intel 810 to the server make sense?

Retina might be heavier on the hardware because it uses huge rings to avoid packet losses even when a few callback execute.

Is the ring size configurable in retina? Alternatively, I can configure VPP ring size using num-rx-desc config and set it equal to retina to see what happens.

tbarbette · 2024-12-16T15:48:58Z

Yes, two NICs would behave differently than one NIC with two ports.

The number of RX descriptors is set with nb_rxd in the config.toml. But I wouldn't expect magic on that side.

I think @thegwan had experience with E810 and resorted to CX5 instead. And I think in the end it was two different CX5 NICs for a similar reasons.

thegwan · 2024-12-16T23:47:39Z

Yes I don't think you should expect to see 200Gbps using both ports on a single NIC, though I can't speak to VPP performance. We chose to use two separate NICs and only one port from each to test beyond 100Gbps for that reason.

We had issues with E810 RSS support in the past so ended up sticking to CX-5. I think those issues are now resolved and unrelated to this though.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Packet drop in presence of load on unrelated port #89

Packet drop in presence of load on unrelated port #89

HKalbasi commented Dec 14, 2024

tbarbette commented Dec 15, 2024

HKalbasi commented Dec 15, 2024

tbarbette commented Dec 16, 2024

thegwan commented Dec 16, 2024

Packet drop in presence of load on unrelated port #89

Packet drop in presence of load on unrelated port #89

Comments

HKalbasi commented Dec 14, 2024

tbarbette commented Dec 15, 2024

HKalbasi commented Dec 15, 2024

tbarbette commented Dec 16, 2024

thegwan commented Dec 16, 2024