-
Notifications
You must be signed in to change notification settings - Fork 136
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update stats for outgoing packets #227
Conversation
CI MessageYour results will arrive shortly |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In response to PR creation
CI Message
Run successful see results:
Test took: 5 minutes, 59 seconds
✔️ Pktgen performance check passed
✔️ Speed Test performance check passed
[Results from nimbnode23]
-
Median TX pps for Pktgen: 7736722
Performance rating - 100.48% (compared to 7700000 average) -
Median TX pps for Speed Tester: 40994204
Performance rating - 97.61% (compared to 42000000 average)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good one line fix.
We could increment the stats either when the NF tries to send out (your solution) or when the manager actually sends them out (there is a small chance they could get dropped because of full rings before that happens).
I think your approach is better since those action stats are referring to what an NF wants to do, not necessarily what happens.
Seems odd, but this actually reduced performance by about 3%. Did this happen on your machine @bdevierno1 ? CI lets this "pass" because it's not huge, but since we have so many PRs that stay above 100%, we probably want to look into this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good fix, thanks Ben. I think that this draws attention to the efficiency of sending packets out of a port. The double pass that we make in this code makes me think NF's should just be responsible for sending packets out of the port...unless that's too intensive for the NF or theres some reason im not catching on why we wouldnt want to do that
We don’t want to trust NFs to directly work with ports, plus there are some potential scalability issues (you would need an output queue on the NIC for each NF, which becomes infeasible for large numbers of NFs). Better to just go through the manager and have one output queue per TX thread. It’s hard to believe the 3% change is real. Have CI test again and see if it goes away. |
I agree that having the NF Manager mediate transmission is important,
especially with a number of competing NFs wanting to transmit and compete
for the link.
…On Thu, Jun 11, 2020 at 2:32 PM Tim Wood ***@***.***> wrote:
We don’t want to trust NFs to directly work with ports, plus there are
some potential scalability issues (you would need an output queue on the
NIC for each NF, which becomes infeasible for large numbers of NFs). Better
to just go through the manager and have one output queue per TX thread.
It’s hard to believe the 3% change is real. Have CI test again and see if
it goes away.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#227 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AC6TUJNXA4253CPK52BFFMDRWFEOTANCNFSM4N3CUWIA>
.
--
K. K. Ramakrishnan
|
@onvm 3%? |
CI MessageYour results will arrive shortly |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@onvm 3%?
CI Message
Run successful see results:
Test took: 5 minutes, 59 seconds
✔️ Pktgen performance check passed
❌ PR drops Speed Test performance below minimum requirement
[Results from nimbnode23]
-
Median TX pps for Pktgen: 7739561
Performance rating - 100.51% (compared to 7700000 average) -
Median TX pps for Speed Tester: 40694672
Performance rating - 96.89% (compared to 42000000 average)
@tim could be compiler optimizations? I dealt with an error last year that was unexplainable and we chalked it up to the compiler... |
I think that in general we have fluctuations on different nodes. It could be a variety of reasons including what processes are active on those cores, etc. I don't think it means much. Edit: Though in this case we are modifying the critical path and that variable is now being referenced from a different proc. I also don't really believe it but I'll double check on my own nodes. |
@dennisafa you're right fluctuations do happen, but it's abnormal twice in a row, when I tested another PR this afternoon that had 100+%. I think we should test this on another machine to verify, because CI hasn't reported lower than 97 (except for ben's memory PR) in a long time |
@onvm Just out of curiosity. |
CI MessageYour results will arrive shortly |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@onvm Just out of curiosity.
CI Message
Run successful see results:
Test took: 5 minutes, 56 seconds
✔️ Pktgen performance check passed
✔️ Speed Test performance check passed
[Results from nimbnode23]
-
Median TX pps for Pktgen: 7734628
Performance rating - 100.45% (compared to 7700000 average) -
Median TX pps for Speed Tester: 41037748
Performance rating - 97.71% (compared to 42000000 average)
1425ee4
@onvm WOPR Did you ever play tic-tac-toe? |
CI MessageYour results will arrive shortly |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@onvm WOPR Did you ever play tic-tac-toe?
CI Message
Run successful see results:
Test took: 6 minutes, 8 seconds
✔️ Pktgen performance check passed
✔️ Speed Test performance check passed
[Results from nimbnode23]
-
Median TX pps for Pktgen: 7743957
Performance rating - 100.57% (compared to 7700000 average) -
Median TX pps for Speed Tester: 42303680
Performance rating - 100.72% (compared to 42000000 average)
Im still not sure why there was a perf drop for speed tester. In the last change I made I switch in the logic of the comparisons as when sending packets out there is a higher probability of ** if (tx_mgr->mgr_type_t != MGR) { **being true. @kevindweb I did notice a perf drop on my node as well in my earlier change. |
@bdevierno1 that's an interesting change, weird that it made CI happy again, that leads me to believe it's definitely a compiler-based problem. |
I think this is ready for a final review and merge. The performance difference seen is an artifact of the testing, I don't think it is real. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me
Update stats updating during ONVM_NF_ACTION_OUT action.
Summary:
ONVM stats was updating stats.act_out twice.
This will occur when sending out packets out a port.
To replicate this run bridge ./go.sh 2 and send packets with speed tester to the bridge nf ./go.sh 1 -d 2
Before:
After:
Usage:
Merging notes:
TODO before merging :
Test Plan:
Check output of ONVM stats for correct values.
Review:
@sreya519