Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

performance issues #55

Closed
bmah888 opened this issue Feb 28, 2014 · 12 comments
Closed

performance issues #55

bmah888 opened this issue Feb 28, 2014 · 12 comments
Milestone

Comments

@bmah888
Copy link
Contributor

bmah888 commented Feb 28, 2014

From bltierney@es.net on December 13, 2012 09:58:43

The reported single flow throughput of iperf3 is considerably lower than nuttcp and netperf on 10G and 40G hosts.
This is particularly true for UDP

Original issue: http://code.google.com/p/iperf/issues/detail?id=55

@bmah888
Copy link
Contributor Author

bmah888 commented Feb 28, 2014

From jef.poskanzer on March 12, 2013 13:48:36

The new --zerocopy option improves performance a lot. We still want improvements in the non-zerocopy case.

@bmah888
Copy link
Contributor Author

bmah888 commented Feb 28, 2014

From bltierney@es.net on July 23, 2013 10:34:55

Performance now better, but pushing to next release to see if we can make better still in the future.

Labels: -Milestone-3.0-Release Milestone-3.1a1

@bmah888
Copy link
Contributor Author

bmah888 commented Feb 28, 2014

From jef.poskanzer on December 09, 2013 16:35:21

Performance now better still, but there's always room for more. How about we keep this issue open indefinately and use it to record ideas for further improvement.

For example, in the most recent round of speedups, I think I might have brought back some gettimeofday() syscalls that I had previously eliminated. Re-removing those might help.

Anyway, suggest we change the milestone from 3.1a1 to future.

@bmah888 bmah888 added this to the 3.1 milestone Feb 28, 2014
@bmah888
Copy link
Contributor Author

bmah888 commented Apr 9, 2014

I just did a few tests between a couple of the 10G hosts on ESnet's 100G testbed, using (roughly) the tip of the iperf3 master codeline. These are "typical" results (in that I did several runs on each of these with results roughly to within a few percent, but did not attempt to compute confidence intervals, etc.):

TCP:
[ 4] 0.00-10.00 sec 11.5 GBytes 9.91 Gbits/sec 0 sender [ 4] 0.00-10.00 sec 11.5 GBytes 9.91 Gbits/sec receiver
UDP (-b 10G):
[ 4] 0.00-10.00 sec 11.4 GBytes 9.81 Gbits/sec 0.004 ms 0/1497394 (0%) [ 4] Sent 1497394 datagrams
SCTP (just for completeness):
[ 4] 0.00-10.00 sec 1.54 GBytes 1.32 Gbits/sec sender [ 4] 0.00-10.00 sec 1.54 GBytes 1.32 Gbits/sec receiver
Using the default parameters (including buffer size) except for the bandwidth setting for --udp.

It looks like, unless I'm reading things completely wrong, that iperf3 can saturate a 10G link with either UDP or TCP. The "known issues" section of the README implies that UDP was (at least at one time) unable to get above 5Gbps.

The original bug report didn't have any details as to what the observed performance was, so I am unable to tell whether the performance is just as "bad" as it was originally or if it's gotten better somehow. @bltierney, any thoughts on this?

@bltierney
Copy link
Contributor

This is great news. what hosts/NICS were you using for this?

I'd like to do some more 40G testing before closing this issue.

And BTW, I was seeing from very strange results last night with and without
the reverse flag.
But this might be a Mellanox issue, not a iperf3 issue.

On Wed, Apr 9, 2014 at 1:27 PM, Bruce A. Mah notifications@github.comwrote:

I just did a few tests between a couple of the 10G hosts on ESnet's 100G
testbed, using (roughly) the tip of the iperf3 master codeline. These are
"typical" results (in that I did several runs on each of these with results
roughly to within a few percent, but did not attempt to compute confidence
intervals, etc.):

TCP:

[ 4] 0.00-10.00 sec 11.5 GBytes 9.91 Gbits/sec 0 sender
[ 4] 0.00-10.00 sec 11.5 GBytes 9.91 Gbits/sec receiver

UDP (-b 10G):

[ 4] 0.00-10.00 sec 11.4 GBytes 9.81 Gbits/sec 0.004 ms 0/1497394 (0%)
[ 4] Sent 1497394 datagrams

SCTP (just for completeness):

[ 4] 0.00-10.00 sec 1.54 GBytes 1.32 Gbits/sec sender
[ 4] 0.00-10.00 sec 1.54 GBytes 1.32 Gbits/sec receiver

Using the default parameters (including buffer size) except for the
bandwidth setting for --udp.

It looks like, unless I'm reading things completely wrong, that iperf3 can
saturate a 10G link with either UDP or TCP. The "known issues" section of
the README implies that UDP was (at least at one time) unable to get above
5Gbps.

The original bug report didn't have any details as to what the observed
performance was, so I am unable to tell whether the performance is just as
"bad" as it was originally or if it's gotten better somehow. @bltierneyhttps://github.com/bltierney,
any thoughts on this?

Reply to this email directly or view it on GitHubhttps://github.com//issues/55#issuecomment-40006030
.

Brian Tierney, http://www.es.net/tierney

Energy Sciences Network (ESnet), Berkeley National Lab

http://fasterdata.es.net

@bltierney
Copy link
Contributor

If memory serves me right, Brian Tierney wrote:

This is great news. what hosts/NICS were you using for this?

nersc-diskpt-1 (client) to nersc-diskpt-2 (server). I am not sure what
the NICs were, but I'm guessing they were Myricom? Interface addresses
were 10.120.11.2 on nersc-diskpt-1 and 10.120.11.4 on nersc-diskpt-2.

(I realize the above will not make any sense to anyone not within ESnet.)

I'd like to do some more 40G testing before closing this issue.

And BTW, I was seeing from very strange results last night with and
without the reverse flag.
But this might be a Mellanox issue, not a iperf3 issue.

I guess we can dig into all these at the same time.

@bmah888
Copy link
Contributor Author

bmah888 commented Apr 9, 2014

Sigh. The above comment was written by @bmah888, not @bltierney. I have no idea why GitHub got confused on this, other than that I replied via email to the GitHub notification I received on the comment above that.

@bmah888
Copy link
Contributor Author

bmah888 commented Apr 30, 2014

Results of more measurements done this week: We have multiple recorded runs of iperf3 doing 10Gbps on cxgb4 (Chelsio) cards with zero loss. However there is another mode where we see consistent packet loss of about 20%. We believe this is related to an issue with interrupts and the CPU core being used for iperf3. @bltierney was able to get consistent results on the diskpt units on the ESnet 100G testbed by tuning CPU affinity with -A 9,9.

While real and significant, tere's not a whole lot we can do about this issue at the level of iperf3, since it does not have any visibility into what core it should be running on for best performance. The solution for now will be to document this behavior.

bmah888 added a commit that referenced this issue May 1, 2014
experiments.

Fixes Issue #55 (at least to the extent that it's not really an
iperf3 issue).
bmah888 added a commit that referenced this issue May 1, 2014
experiments.

Fixes Issue #55 (at least to the extent that it's not really an
iperf3 issue).

(cherry picked from commit e4e22a5)
Signed-off-by: Bruce A. Mah <bmah@es.net>

Conflicts:
	README.md
@bmah888
Copy link
Contributor Author

bmah888 commented May 1, 2014

Documented what we know about this issue, committed documentation changes to the master and 3.0-STABLE branches.

Closing as fixed, at least for now.

@XtremeOwnageDotCom
Copy link

@bmah888 since, you do have this handy ticket here- gonna slap these details here.

Iperf3

Client:

root@kube01:~# iperf3 -c 10.100.4.105 --zerocopy
Connecting to host 10.100.4.105, port 5201
[  5] local 10.100.4.100 port 46158 connected to 10.100.4.105 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  4.19 GBytes  36.0 Gbits/sec  328   1.36 MBytes
[  5]   1.00-2.00   sec  4.05 GBytes  34.8 Gbits/sec  239   1.15 MBytes
[  5]   2.00-3.00   sec  4.60 GBytes  39.5 Gbits/sec  163   1.20 MBytes
[  5]   3.00-4.00   sec  4.82 GBytes  41.4 Gbits/sec  448   1.26 MBytes
[  5]   4.00-5.00   sec  3.82 GBytes  32.9 Gbits/sec  187   1.12 MBytes
[  5]   5.00-6.00   sec  3.44 GBytes  29.6 Gbits/sec  113   1.26 MBytes
[  5]   6.00-7.00   sec  4.60 GBytes  39.5 Gbits/sec  466   1.01 MBytes
[  5]   7.00-8.00   sec  4.47 GBytes  38.4 Gbits/sec  410   1.25 MBytes
[  5]   8.00-9.00   sec  4.66 GBytes  40.1 Gbits/sec  446   1.20 MBytes
[  5]   9.00-10.00  sec  4.09 GBytes  35.1 Gbits/sec  348   1.24 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  42.8 GBytes  36.7 Gbits/sec  3148             sender
[  5]   0.00-10.00  sec  42.8 GBytes  36.7 Gbits/sec                  receiver

Server: root@kube05:~# iperf3 -s

Using -P 2, yields negligible differences.

root@kube01:~# iperf3 -c 10.100.4.105 --zerocopy -P 2
Connecting to host 10.100.4.105, port 5201
[  5] local 10.100.4.100 port 36242 connected to 10.100.4.105 port 5201
[  7] local 10.100.4.100 port 36244 connected to 10.100.4.105 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  2.38 GBytes  20.4 Gbits/sec    0   1.40 MBytes
[  7]   0.00-1.00   sec  2.39 GBytes  20.5 Gbits/sec    0   1021 KBytes
[SUM]   0.00-1.00   sec  4.77 GBytes  40.9 Gbits/sec    0
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   1.00-2.00   sec  2.44 GBytes  20.9 Gbits/sec    0   1.40 MBytes
[  7]   1.00-2.00   sec  2.43 GBytes  20.9 Gbits/sec    0   1021 KBytes
[SUM]   1.00-2.00   sec  4.87 GBytes  41.8 Gbits/sec    0
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   2.00-3.00   sec  2.39 GBytes  20.5 Gbits/sec    0   1.40 MBytes
[  7]   2.00-3.00   sec  2.39 GBytes  20.5 Gbits/sec    0   1021 KBytes
[SUM]   2.00-3.00   sec  4.78 GBytes  41.0 Gbits/sec    0
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   3.00-4.00   sec  2.42 GBytes  20.8 Gbits/sec    0   1.40 MBytes
[  7]   3.00-4.00   sec  2.42 GBytes  20.8 Gbits/sec    0   1021 KBytes
[SUM]   3.00-4.00   sec  4.84 GBytes  41.5 Gbits/sec    0
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   4.00-5.00   sec  2.41 GBytes  20.7 Gbits/sec    0   1.40 MBytes
[  7]   4.00-5.00   sec  2.41 GBytes  20.7 Gbits/sec    0   1.11 MBytes
[SUM]   4.00-5.00   sec  4.82 GBytes  41.4 Gbits/sec    0
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   5.00-6.00   sec  2.43 GBytes  20.8 Gbits/sec    0   1.40 MBytes
[  7]   5.00-6.00   sec  2.43 GBytes  20.9 Gbits/sec    0   1.11 MBytes
[SUM]   5.00-6.00   sec  4.85 GBytes  41.7 Gbits/sec    0
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   6.00-7.00   sec  2.45 GBytes  21.1 Gbits/sec    0   1.40 MBytes
[  7]   6.00-7.00   sec  2.45 GBytes  21.0 Gbits/sec   12   1.07 MBytes
[SUM]   6.00-7.00   sec  4.90 GBytes  42.1 Gbits/sec   12
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   7.00-8.00   sec  2.44 GBytes  21.0 Gbits/sec    0   1.40 MBytes
[  7]   7.00-8.00   sec  2.44 GBytes  21.0 Gbits/sec    0   1.07 MBytes
[SUM]   7.00-8.00   sec  4.88 GBytes  41.9 Gbits/sec    0
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   8.00-9.00   sec  2.43 GBytes  20.9 Gbits/sec    0   1.40 MBytes
[  7]   8.00-9.00   sec  2.43 GBytes  20.9 Gbits/sec    0   1.07 MBytes
[SUM]   8.00-9.00   sec  4.86 GBytes  41.8 Gbits/sec    0
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   9.00-10.00  sec  2.42 GBytes  20.8 Gbits/sec    0   1.40 MBytes
[  7]   9.00-10.00  sec  2.42 GBytes  20.8 Gbits/sec    0   1.24 MBytes
[SUM]   9.00-10.00  sec  4.84 GBytes  41.5 Gbits/sec    0
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  24.2 GBytes  20.8 Gbits/sec    0             sender
[  5]   0.00-10.00  sec  24.2 GBytes  20.8 Gbits/sec                  receiver
[  7]   0.00-10.00  sec  24.2 GBytes  20.8 Gbits/sec   12             sender
[  7]   0.00-10.00  sec  24.2 GBytes  20.8 Gbits/sec                  receiver
[SUM]   0.00-10.00  sec  48.4 GBytes  41.6 Gbits/sec   12             sender
[SUM]   0.00-10.00  sec  48.4 GBytes  41.6 Gbits/sec                  receiver

IPerf

Using iperf without parallel threads, yields... roughly the same as iperf3

root@kube01:~# iperf -c 10.100.4.105
------------------------------------------------------------
Client connecting to 10.100.4.105, TCP port 5001
TCP window size: 16.0 KByte (default)
------------------------------------------------------------
[  1] local 10.100.4.100 port 36088 connected with 10.100.4.105 port 5001 (icwnd/mss/irtt=14/1448/116)
[ ID] Interval       Transfer     Bandwidth
[  1] 0.0000-10.0110 sec  32.5 GBytes  27.8 Gbits/sec

BUT- giving a few extra threads- makes a WORLD of difference.

root@kube01:~# iperf -c 10.100.4.105
------------------------------------------------------------
Client connecting to 10.100.4.105, TCP port 5001
TCP window size: 16.0 KByte (default)
------------------------------------------------------------
[  1] local 10.100.4.100 port 36088 connected with 10.100.4.105 port 5001 (icwnd/mss/irtt=14/1448/116)
[ ID] Interval       Transfer     Bandwidth
[  1] 0.0000-10.0110 sec  32.5 GBytes  27.8 Gbits/sec
root@kube01:~# iperf -c 10.100.4.105 -P 6
------------------------------------------------------------
Client connecting to 10.100.4.105, TCP port 5001
TCP window size: 85.0 KByte (default)
------------------------------------------------------------
[  1] local 10.100.4.100 port 45476 connected with 10.100.4.105 port 5001 (icwnd/mss/irtt=14/1448/98)
[  6] local 10.100.4.100 port 45544 connected with 10.100.4.105 port 5001 (icwnd/mss/irtt=14/1448/101)
[  4] local 10.100.4.100 port 45514 connected with 10.100.4.105 port 5001 (icwnd/mss/irtt=14/1448/214)
[  2] local 10.100.4.100 port 45506 connected with 10.100.4.105 port 5001 (icwnd/mss/irtt=14/1448/141)
[  5] local 10.100.4.100 port 45528 connected with 10.100.4.105 port 5001 (icwnd/mss/irtt=14/1448/162)
[  3] local 10.100.4.100 port 45492 connected with 10.100.4.105 port 5001 (icwnd/mss/irtt=14/1448/119)
[ ID] Interval       Transfer     Bandwidth
[  3] 0.0000-10.0030 sec  14.0 GBytes  12.0 Gbits/sec
[  1] 0.0000-10.0029 sec  7.19 GBytes  6.18 Gbits/sec
[  5] 0.0000-10.0027 sec  27.1 GBytes  23.3 Gbits/sec
[  6] 0.0000-10.0030 sec  13.6 GBytes  11.7 Gbits/sec
[  2] 0.0000-10.0029 sec  13.3 GBytes  11.5 Gbits/sec
[  4] 0.0000-10.0028 sec  13.5 GBytes  11.6 Gbits/sec
[SUM] 0.0000-10.0007 sec  88.7 GBytes  76.2 Gbits/sec

Any- ideas on how to properly benchmark 40/100/200/400GBe with iperf3?

@davidBar-On
Copy link
Contributor

@XtremeOwnageDotCom, what is the iperf3 version you are using (iperf3 -v)? iperf3 supports multi-thread only from version 3.16. Also, suggest that you will run iperf3 with -P6 as you did for iperf.

@XtremeOwnageDotCom
Copy link

XtremeOwnageDotCom commented Jul 6, 2024

iperf3 supports multi-thread only from version 3.16.

That- would explain it. The version included with my distro's package manager is a hair old... 3.12

Appears- debian is a bit behind: https://packages.debian.org/stable/net/iperf3

apt-get install autoconf libtool gcc make
mkdir iperf
cd iperf
wget https://github.com/esnet/iperf/releases/download/3.17.1/iperf-3.17.1.tar.gz
tar -xf iperf-3.17.1.tar.gz
cd iperf-3.17.1
./bootstrap.sh;./configure; make; make install
export LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH
/usr/local/bin/iperf3 -v
iperf 3.17.1 (cJSON 1.7.15)

After building on two hosts- and running...

Client:

/usr/local/bin/iperf3 -c 10.100.4.102 --zerocopy -P 6
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  9.80 GBytes  8.41 Gbits/sec  852             sender
[  5]   0.00-10.00  sec  9.79 GBytes  8.41 Gbits/sec                  receiver
[  7]   0.00-10.00  sec  8.36 GBytes  7.18 Gbits/sec  652             sender
[  7]   0.00-10.00  sec  8.35 GBytes  7.17 Gbits/sec                  receiver
[  9]   0.00-10.00  sec  7.12 GBytes  6.12 Gbits/sec  621             sender
[  9]   0.00-10.00  sec  7.12 GBytes  6.11 Gbits/sec                  receiver
[ 11]   0.00-10.00  sec  8.51 GBytes  7.31 Gbits/sec  1360             sender
[ 11]   0.00-10.00  sec  8.50 GBytes  7.30 Gbits/sec                  receiver
[ 13]   0.00-10.00  sec  6.89 GBytes  5.92 Gbits/sec  285             sender
[ 13]   0.00-10.00  sec  6.88 GBytes  5.91 Gbits/sec                  receiver
[ 15]   0.00-10.00  sec  8.95 GBytes  7.69 Gbits/sec  219             sender
[ 15]   0.00-10.00  sec  8.95 GBytes  7.68 Gbits/sec                  receiver
[SUM]   0.00-10.00  sec  49.6 GBytes  42.6 Gbits/sec  3989             sender
[SUM]   0.00-10.00  sec  49.6 GBytes  42.6 Gbits/sec                  receiver

Still- coming up a bit short.

Running against a server with much faster single-threaded performance- gives much better results though.

root@kube01:~/iperf/iperf-3.17.1# /usr/local/bin/iperf3 -c 10.100.4.105 --zerocopy -P 6
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  16.5 GBytes  14.2 Gbits/sec  220             sender
[  5]   0.00-10.00  sec  16.5 GBytes  14.2 Gbits/sec                  receiver
[  7]   0.00-10.00  sec  10.8 GBytes  9.24 Gbits/sec  682             sender
[  7]   0.00-10.00  sec  10.8 GBytes  9.23 Gbits/sec                  receiver
[  9]   0.00-10.00  sec  9.63 GBytes  8.27 Gbits/sec  970             sender
[  9]   0.00-10.00  sec  9.62 GBytes  8.26 Gbits/sec                  receiver
[ 11]   0.00-10.00  sec  16.3 GBytes  14.0 Gbits/sec   32             sender
[ 11]   0.00-10.00  sec  16.3 GBytes  14.0 Gbits/sec                  receiver
[ 13]   0.00-10.00  sec  16.6 GBytes  14.2 Gbits/sec  230             sender
[ 13]   0.00-10.00  sec  16.6 GBytes  14.2 Gbits/sec                  receiver
[ 15]   0.00-10.00  sec  13.6 GBytes  11.7 Gbits/sec  238             sender
[ 15]   0.00-10.00  sec  13.6 GBytes  11.7 Gbits/sec                  receiver
[SUM]   0.00-10.00  sec  83.4 GBytes  71.6 Gbits/sec  2372             sender
[SUM]   0.00-10.00  sec  83.3 GBytes  71.6 Gbits/sec                  receiver

I'll ignore the slower host for now- as I am in the middle of network configuration changes, switch updates, etc, but- overall, really happy the -P option was added back in.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants