Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Packet drop in presence of load on unrelated port #89

Open
HKalbasi opened this issue Dec 14, 2024 · 4 comments
Open

Packet drop in presence of load on unrelated port #89

HKalbasi opened this issue Dec 14, 2024 · 4 comments

Comments

@HKalbasi
Copy link
Contributor

I have a intel 810 card with two 100Gbit port. When one of the ports is under ~80Gbit/s load, everything is ok, but when I make the unrelated port under the load as well, without changing retina config to include it, I see a huge 100kpkt/s drop. I thought there is a problem with dpdk and/or hardware, but it is not that simple since another dpdk based app (vpp) is able to handle that traffic without loss.

How I can troubleshoot this problem?

@tbarbette
Copy link
Collaborator

Are you sure about the VPP measurement? Intel 810 are notoriously not able to sustain the traffic on both ports. I even heard some vendors classify the second port as "active backup". Retina might be heavier on the hardware because it uses huge rings to avoid packet losses even when a few callback execute. Except from that I don't see why it would be more sensible than another software.

@HKalbasi
Copy link
Contributor Author

Are you sure about the VPP measurement?

VPP report its drops in rx-miss field of the interface stats, and it is almost constant. I have previously seen that it would go up (when I filled the port with ~95Gbit/s) so I believe this time it has no drops. But I will test with dpdk-testpmd as well to make sure.

Intel 810 are notoriously not able to sustain the traffic on both ports. I even heard some vendors classify the second port as "active backup"

Do you have any links/references supporting this? And in that case, what NIC would you suggest? I believe my server's resources can handle retina with 200Gbit/s data. Would adding two intel 810 to the server make sense?

Retina might be heavier on the hardware because it uses huge rings to avoid packet losses even when a few callback execute.

Is the ring size configurable in retina? Alternatively, I can configure VPP ring size using num-rx-desc config and set it equal to retina to see what happens.

@tbarbette
Copy link
Collaborator

Yes, two NICs would behave differently than one NIC with two ports.

The number of RX descriptors is set with nb_rxd in the config.toml. But I wouldn't expect magic on that side.

I think @thegwan had experience with E810 and resorted to CX5 instead. And I think in the end it was two different CX5 NICs for a similar reasons.

@thegwan
Copy link
Contributor

thegwan commented Dec 16, 2024

Yes I don't think you should expect to see 200Gbps using both ports on a single NIC, though I can't speak to VPP performance. We chose to use two separate NICs and only one port from each to test beyond 100Gbps for that reason.

We had issues with E810 RSS support in the past so ended up sticking to CX-5. I think those issues are now resolved and unrelated to this though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants