Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update stats for outgoing packets #227

Merged
merged 2 commits into from
Jun 25, 2020
Merged

Conversation

bdevierno1
Copy link
Contributor

Update stats updating during ONVM_NF_ACTION_OUT action.

Summary:

ONVM stats was updating stats.act_out twice.

This will occur when sending out packets out a port.
To replicate this run bridge ./go.sh 2 and send packets with speed tester to the bridge nf ./go.sh 1 -d 2
Before:
image
After:
image

Usage:

This PR includes
Resolves issues
Breaking API changes
Internal API changes
Usability improvements
Bug fixes 👍
New functionality
New NF/onvm_mgr args
Changes to starting NFs
Dependency updates
Web stats updates

Merging notes:

  • Dependencies: None

TODO before merging :

  • [👍 ] PR is ready for review

Test Plan:

Check output of ONVM stats for correct values.

Review:

@sreya519

@onvm
Copy link

onvm commented Jun 11, 2020

In response to PR creation

CI Message

Your results will arrive shortly

Copy link

@onvm onvm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In response to PR creation

CI Message

Run successful see results:
Test took: 5 minutes, 59 seconds
✔️ Pktgen performance check passed
✔️ Speed Test performance check passed

[Results from nimbnode23]

  • Median TX pps for Pktgen: 7736722
    Performance rating - 100.48% (compared to 7700000 average)

  • Median TX pps for Speed Tester: 40994204
    Performance rating - 97.61% (compared to 42000000 average)

twood02
twood02 previously approved these changes Jun 11, 2020
Copy link
Member

@twood02 twood02 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good one line fix.

We could increment the stats either when the NF tries to send out (your solution) or when the manager actually sends them out (there is a small chance they could get dropped because of full rings before that happens).

I think your approach is better since those action stats are referring to what an NF wants to do, not necessarily what happens.

@kevindweb
Copy link
Contributor

Seems odd, but this actually reduced performance by about 3%. Did this happen on your machine @bdevierno1 ? CI lets this "pass" because it's not huge, but since we have so many PRs that stay above 100%, we probably want to look into this

dennisafa
dennisafa previously approved these changes Jun 11, 2020
Copy link
Member

@dennisafa dennisafa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good fix, thanks Ben. I think that this draws attention to the efficiency of sending packets out of a port. The double pass that we make in this code makes me think NF's should just be responsible for sending packets out of the port...unless that's too intensive for the NF or theres some reason im not catching on why we wouldnt want to do that

@twood02
Copy link
Member

twood02 commented Jun 11, 2020

We don’t want to trust NFs to directly work with ports, plus there are some potential scalability issues (you would need an output queue on the NIC for each NF, which becomes infeasible for large numbers of NFs). Better to just go through the manager and have one output queue per TX thread.

It’s hard to believe the 3% change is real. Have CI test again and see if it goes away.

@kkrama
Copy link

kkrama commented Jun 11, 2020 via email

@kevindweb
Copy link
Contributor

@onvm 3%?

@onvm
Copy link

onvm commented Jun 11, 2020

@onvm 3%?

CI Message

Your results will arrive shortly

Copy link

@onvm onvm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@onvm 3%?

CI Message

Run successful see results:
Test took: 5 minutes, 59 seconds
✔️ Pktgen performance check passed
❌ PR drops Speed Test performance below minimum requirement

[Results from nimbnode23]

  • Median TX pps for Pktgen: 7739561
    Performance rating - 100.51% (compared to 7700000 average)

  • Median TX pps for Speed Tester: 40694672
    Performance rating - 96.89% (compared to 42000000 average)

@kevindweb
Copy link
Contributor

@tim could be compiler optimizations? I dealt with an error last year that was unexplainable and we chalked it up to the compiler...

@dennisafa
Copy link
Member

dennisafa commented Jun 11, 2020

Seems odd, but this actually reduced performance by about 3%. Did this happen on your machine @bdevierno1 ? CI lets this "pass" because it's not huge, but since we have so many PRs that stay above 100%, we probably want to look into this

I think that in general we have fluctuations on different nodes. It could be a variety of reasons including what processes are active on those cores, etc. I don't think it means much. Edit: Though in this case we are modifying the critical path and that variable is now being referenced from a different proc. I also don't really believe it but I'll double check on my own nodes.
Also thank you for the clarifications @kkrama @twood02 that makes sense

@kevindweb
Copy link
Contributor

@dennisafa you're right fluctuations do happen, but it's abnormal twice in a row, when I tested another PR this afternoon that had 100+%. I think we should test this on another machine to verify, because CI hasn't reported lower than 97 (except for ben's memory PR) in a long time

@bdevierno1
Copy link
Contributor Author

@onvm Just out of curiosity.

@onvm
Copy link

onvm commented Jun 11, 2020

@onvm Just out of curiosity.

CI Message

Your results will arrive shortly

onvm
onvm previously approved these changes Jun 11, 2020
Copy link

@onvm onvm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@onvm Just out of curiosity.

CI Message

Run successful see results:
Test took: 5 minutes, 56 seconds
✔️ Pktgen performance check passed
✔️ Speed Test performance check passed

[Results from nimbnode23]

  • Median TX pps for Pktgen: 7734628
    Performance rating - 100.45% (compared to 7700000 average)

  • Median TX pps for Speed Tester: 41037748
    Performance rating - 97.71% (compared to 42000000 average)

@bdevierno1 bdevierno1 dismissed stale reviews from onvm, dennisafa, and twood02 via 1425ee4 June 11, 2020 23:48
@bdevierno1
Copy link
Contributor Author

@onvm WOPR Did you ever play tic-tac-toe?

@onvm
Copy link

onvm commented Jun 11, 2020

@onvm WOPR Did you ever play tic-tac-toe?

CI Message

Your results will arrive shortly

Copy link

@onvm onvm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@onvm WOPR Did you ever play tic-tac-toe?

CI Message

Run successful see results:
Test took: 6 minutes, 8 seconds
✔️ Pktgen performance check passed
✔️ Speed Test performance check passed

[Results from nimbnode23]

  • Median TX pps for Pktgen: 7743957
    Performance rating - 100.57% (compared to 7700000 average)

  • Median TX pps for Speed Tester: 42303680
    Performance rating - 100.72% (compared to 42000000 average)

@bdevierno1
Copy link
Contributor Author

Im still not sure why there was a perf drop for speed tester. In the last change I made I switch in the logic of the comparisons as when sending packets out there is a higher probability of ** if (tx_mgr->mgr_type_t != MGR) { **being true.

@kevindweb I did notice a perf drop on my node as well in my earlier change.

@kevindweb
Copy link
Contributor

@bdevierno1 that's an interesting change, weird that it made CI happy again, that leads me to believe it's definitely a compiler-based problem.

@twood02 twood02 added this to the ONVM 20.07 milestone Jun 18, 2020
@twood02
Copy link
Member

twood02 commented Jun 25, 2020

I think this is ready for a final review and merge. The performance difference seen is an artifact of the testing, I don't think it is real.

Copy link
Member

@dennisafa dennisafa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me

@dennisafa dennisafa merged commit 9f677c4 into sdnfv:develop Jun 25, 2020
@bdevierno1 bdevierno1 deleted the out_stats branch June 26, 2020 12:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants