Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

intervals don't work as expected #125

Closed
bmah888 opened this issue Feb 28, 2014 · 18 comments
Closed

intervals don't work as expected #125

bmah888 opened this issue Feb 28, 2014 · 18 comments
Assignees

Comments

@bmah888
Copy link
Contributor

bmah888 commented Feb 28, 2014

From AaronMatthewBrown on December 18, 2013 05:39:49

In testing intervals on lossy networks, we get some ... interesting results:

[ 4] 0.00-2.13 sec 256 KBytes 984 Kbits/sec
[ 4] 2.13-5.13 sec 256 KBytes 700 Kbits/sec
[ 4] 5.13-7.63 sec 128 KBytes 418 Kbits/sec
[ 4] 7.63-9.53 sec 128 KBytes 552 Kbits/sec
[ 4] 9.53-11.03 sec 128 KBytes 699 Kbits/sec
[ 4] 11.03-13.23 sec 128 KBytes 477 Kbits/sec
[ 4] 13.23-19.23 sec 128 KBytes 175 Kbits/sec
[ 4] 19.23-19.23 sec 0.00 Bytes 0.00 bits/sec
[ 4] 19.23-19.23 sec 0.00 Bytes 0.00 bits/sec
[ 4] 19.23-21.23 sec 128 KBytes 524 Kbits/sec

Email from Aaron:

i did some debugging on this, and I think the issue may be somewhat complex to
fix. It looks like the logic was to have the sending functions iperf_send ->
iperf_tcp_send -> Nwrite expect that when SIGALRM goes off, the syscall will
be interrupted. That doesn't seem to be the case for read/write/etc. Adding a
siginterrupt(SIGARLM, 1) will quasi-fix that. It will cause the syscall to
return -1/EINTR if nothing has been transferred. If something has been
transferred, however, it returns the amount transferred. Unfortunately, Nwrite
happily keeps writing thinking that it should. This situation also occurs if
the SIGALRM happens to hit between writes (which seems not uncommon).

There is a way to work around this, but it requires changing the read/write
functions. This model gets used for this same situation in places in bwctl.
Basically, you pass an "interrupt" pointer into the functions. This pointer
points to an integer that, if the alarm goes off, gets incremented. Then when
a function returns, you check if *interrupt is non-zero, and if so, treat it
the same way as if -1/EINTR had occurred. e.g.

iperf_client_api.c (note I'm passing the pointer to sigarlm_triggered down into iperf_send)
if (iperf_send(test, concurrency_model == cm_itimer ? NULL : &write_set, &sigalrm_triggered) < 0)

iperf_tcp.c:
iperf_tcp_send(struct iperf_stream *sp, int *interrupt)
….
r = Nwrite(sp->socket, sp->buffer, sp->settings->blksize, Ptcp, interrupt);

net.c
int Nwrite_interruptible(int fd, const char *buf, size_t count, int prot, int *interrupt)
….
r = write(fd, buf, nleft);
if (interrupt && *interrupt)
return count - nleft;

I have a very hacky patch that does the above for just this code path (i.e.
breaks literally every other code path), and though other things seem broken
for reasons I've not bothered to look into. The intervals look to be running
more reliably (ignore the all 0's as they're among the "other things seem
broken"):

[ 4] 0.00-2.03 sec 0.00 Bytes 0.00 bits/sec
[ 4] 2.03-4.03 sec 0.00 Bytes 0.00 bits/sec
[ 4] 4.03-6.03 sec 0.00 Bytes 0.00 bits/sec

Email from Jef:

It looks like the logic was to have the sending functions
iperf_send -> iperf_tcp_send -> Nwrite expect that when SIGALRM
goes off, the syscall will be interrupted.

My interpretation is rather that the expectation is that Nwrite will
always complete fast enough that timer accuracy will not be noticably
affected. Consider that the code was originally written with only
the select mode, no SIGALRM at all. In select mode there's nothing
that can interrupt a write so it always has to complete.

The default write size is 128k, and your example is transfering
only one or two of those blocks per interval. Try lowering the size
a lot and see if that evens out the interval timing. How about: -l 8k

So if lowering the block size does help, then how about
adding a little one-step auto-tuner for the block size?
Something like this, in the stats callback:

if test->sender
if test->settings->blksize == DEFAULT_xxx_BLKSIZE
n = [estimate of number of blocks per interval, at current rate]
if n < 20 # or so
printf "default block size %d is too large, auto-lowering it to %d", DEFAULT_xxx_BLKSIZE, LOWER_xxx_BLKSIZE
settings->blksize = LOWER_xxx_BLKSIZE

Email from Aaron:

If an interval timer goes off just after the client has started to write,
won't this still create a delayed reaction even if the duration of this delay
is decreased?

Email from Jef:

Sure. I don't think the timer mechanism is intended to be super precise.
The stats are still accurate because they go by clock time actually elapsed,
rather than time the interval was supposed to take.

Email from Aaron:

If there's a fair amount of loss, the send buffer could completely fill, and
that write could hang for an indeterminate amount of time. Since the issue
crops up when the network is poor, and that's the exact time when these
intervals are most important, I think trying to get it as close to the
interval is important. Beyond that, iperf and nuttcp both hit the interval on
the nose which would make it weird if iperf3 gets it oddly off.

Attachment: interval_timing_changes.patch

Original issue: http://code.google.com/p/iperf/issues/detail?id=125

@bmah888
Copy link
Contributor Author

bmah888 commented Feb 28, 2014

From AaronMatthewBrown on December 18, 2013 07:26:07

A different possible fix. This sets the non-blocking mode on the sockets, and disables the sigalarm mode. The numbers come out on the nose with this, though I'm not sure if it breaks anything else.

Attachment: socket_nonblocking.patch

@bmah888
Copy link
Contributor Author

bmah888 commented Feb 28, 2014

From ssahani@redhat.com on December 18, 2013 08:05:37

setnonblocking already in net.c

@bmah888
Copy link
Contributor Author

bmah888 commented Feb 28, 2014

From jef.poskanzer on December 18, 2013 08:08:45

Yeah - not currently called anywhere.

Going non-blocking and getting rid of sigalrm mode is a not-obviously-bad idea. In addition to looking for things it breaks, we'd have to look carefully at the performance.

@bmah888
Copy link
Contributor Author

bmah888 commented Feb 28, 2014

From ssahani@redhat.com on December 18, 2013 08:11:03

Talking about the interrupted system call , the signals are handled by signal sys call .

Looking at sigaction(2) man page can we think that setting the SA_RESTART flag is simpler that handling system call interruption. The documentation says that setting it will make certain system calls automatically restartable across signals.
more information in man 7 signal. which has list of syscall automatically restarted.

It would be a simpler implementation .

@bmah888
Copy link
Contributor Author

bmah888 commented Feb 28, 2014

From AaronMatthewBrown on December 18, 2013 08:13:57

The problem is SA_RESTART flag apparently gets auto-set if you use 'signal' because in doing an strace, it's auto-restarting anyway. The restarting is actually the problem, because when a network is lossy, the write can hang for an indeterminate period of time which is problematic when you're trying to make sure you're printing out results at specific intervals.

@bmah888
Copy link
Contributor Author

bmah888 commented Feb 28, 2014

From ssahani@redhat.com on December 18, 2013 08:45:23

the Nwrite function actually calculating how much is written and restarting the write again if it's interrupted if I am not wrong.

         switch (errno) {
            case EINTR:
            return count - nleft;

Currently sockets are not non-blocked yes write wound hang.

@bmah888
Copy link
Contributor Author

bmah888 commented Feb 28, 2014

From AaronMatthewBrown on December 18, 2013 08:52:02

The EINTR doesn't actually come out though because 'write' gets auto-restarted. Even if you do enable interrupting, there are 2 situations where it will still fail:

  1. if any data has been written, the write returns the amount written, not -1/EINTR
  2. if the signal occurs in-between calls to write, the EINTR code-path won't get called. I noticed this occur with some frequency even if i had it try to interrupt the calls to write. I'm not sure if this is a property of linux delaying the signal until the syscall is finished, or what, but it'd happen a number of times during every run I tried.

@bmah888
Copy link
Contributor Author

bmah888 commented Feb 28, 2014

From ssahani@redhat.com on December 18, 2013 09:09:20

looking at the manual man 7 signal

" If a blocked call to one of the following interfaces is interrupted by a signal handler, then the call will be automatically restarted after the signal handler returns if
the SA_RESTART flag was used; otherwise the call will fail with the error EINTR:"

There is noway to specify SA_RESTART in signal system call.

  1. Correct. But need to check out the errno value if it's interrupted .. how ever not sure will debug it.

@bmah888
Copy link
Contributor Author

bmah888 commented Feb 28, 2014

From bltierney@es.net on December 18, 2013 13:59:17

Labels: Milestone-3.1a1

@bmah888
Copy link
Contributor Author

bmah888 commented Feb 28, 2014

From AaronMatthewBrown on February 21, 2014 12:26:30

Here's an updated patch that also rips out the sigalrm code. In testing locally, things seem to work as expected (e.g. even on very lossy connections, the intervals happen as expected instead of quasi-randomly). Would it be possible to get this tested on the 40G testbed? I'm curious what impact the change from sigalrm to select might have.

Attachment: iperf3_nonblocking_mode.patch

@bmah888 bmah888 added this to the 3.1 milestone Feb 28, 2014
@bmah888 bmah888 self-assigned this Apr 16, 2014
bmah888 added a commit that referenced this issue Apr 16, 2014
Originally submitted by:	@i2aaron
@bmah888
Copy link
Contributor Author

bmah888 commented Apr 16, 2014

Just to clarify: That commit was to a branch, not the mainline. I'll merge it after some testing and/or tweaking.

@bmah888
Copy link
Contributor Author

bmah888 commented Apr 17, 2014

So far so good testing on ESnet 100G testbed. No significant performance differences noted in some admittedly brief testing.

Interval timers do behave much more sanely with (pathological) 10% packet loss configured using tc qdisc add dev eth40 root netem loss 10%. Before revision 081ba8e:

[bmah@nersc-diskpt-6 ~]$ iperf3 -c 192.168.101.9 
Connecting to host 192.168.101.9, port 5201
[  4] local 192.168.101.8 port 41175 connected to 192.168.101.9 port 5201
[ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
[  4]   0.00-1.05   sec  1.25 MBytes  9.98 Mbits/sec   19   43.7 KBytes       
[  4]   1.05-2.10   sec  2.25 MBytes  18.0 Mbits/sec   28   35.0 KBytes       
[  4]   2.10-3.07   sec  1.12 MBytes  9.74 Mbits/sec   18   8.74 KBytes       
[  4]   3.07-4.08   sec  2.25 MBytes  18.6 Mbits/sec   25   17.5 KBytes       
[  4]   4.08-5.51   sec  2.50 MBytes  14.7 Mbits/sec   37   26.2 KBytes       
[  4]   5.51-6.00   sec  2.25 MBytes  38.5 Mbits/sec   21   17.5 KBytes       
[  4]   6.00-8.01   sec   512 KBytes  2.09 Mbits/sec   10   35.0 KBytes       
[  4]   8.01-8.01   sec  0.00 Bytes  0.00 bits/sec    0   35.0 KBytes       
[  4]   8.01-9.07   sec  1.62 MBytes  12.9 Mbits/sec   26   35.0 KBytes       
[  4]   9.07-10.07  sec   512 KBytes  4.17 Mbits/sec   11   43.7 KBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-10.07  sec  14.2 MBytes  11.9 Mbits/sec  195             sender
[  4]   0.00-10.07  sec  14.1 MBytes  11.8 Mbits/sec                  receiver

iperf Done.

After revision 081ba8e:

[bmah@nersc-diskpt-6 ~]$ ./iperf3 -c 192.168.101.9 
Connecting to host 192.168.101.9, port 5201
[  4] local 192.168.101.8 port 41173 connected to 192.168.101.9 port 5201
[ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
[  4]   0.00-1.00   sec  1.67 MBytes  14.0 Mbits/sec   21   43.7 KBytes       
[  4]   1.00-2.00   sec  4.29 MBytes  36.0 Mbits/sec   38   17.5 KBytes       
[  4]   2.00-3.00   sec  3.50 MBytes  29.4 Mbits/sec   28   26.2 KBytes       
[  4]   3.00-4.00   sec   734 KBytes  6.01 Mbits/sec   12   26.2 KBytes       
[  4]   4.00-5.00   sec  1.39 MBytes  11.7 Mbits/sec   23   35.0 KBytes       
[  4]   5.00-6.00   sec   489 KBytes  4.01 Mbits/sec   16   35.0 KBytes       
[  4]   6.00-7.00   sec   384 KBytes  3.15 Mbits/sec    9   17.5 KBytes       
[  4]   7.00-8.00   sec  6.10 MBytes  51.2 Mbits/sec   51   8.74 KBytes       
[  4]   8.00-9.00   sec   795 KBytes  6.51 Mbits/sec   15   26.2 KBytes       
[  4]   9.00-10.00  sec   620 KBytes  5.08 Mbits/sec   11   17.5 KBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-10.00  sec  19.9 MBytes  16.7 Mbits/sec  224             sender
[  4]   0.00-10.00  sec  19.6 MBytes  16.4 Mbits/sec                  receiver

iperf Done.

@bmah888
Copy link
Contributor Author

bmah888 commented Apr 17, 2014

I merged the features/issue-125 branch to the master (mainline) codeline in revision 54f2264. I probably could have done a better job in the commit message, but I don't think I can go back and edit it.

Closing this bug as fixed.

@bmah888 bmah888 closed this as completed Apr 17, 2014
@bmah888
Copy link
Contributor Author

bmah888 commented Apr 21, 2014

This code change seems to have broken UDP tests, which now emit many lines of the form:

iperf3: OUT OF ORDER - incoming packet = 1 and received packet = 528351093 AND SP = 5

I'm testing a fix. Basically we want to only set non-blocking mode on sockets on the sending side, but not on the receiver side. (Usually this means only the client should set non-blocking mode, unless --reverse is being used, in which case it's only the server that should do this.)

@bmah888 bmah888 reopened this Apr 21, 2014
bmah888 added a commit that referenced this issue Apr 23, 2014
transfer.

Note that the sender can either be the client or the server depending
on whether --reverse is used.

This fixes some problems with UDP transfers getting severely confused
and (wrongly) complaining about packets arriving out of order.

Related to issue #125.
@bmah888
Copy link
Contributor Author

bmah888 commented Apr 23, 2014

Leaving this issue open for now while doing more testing on the (possible) fix I just committed.

bmah888 added a commit that referenced this issue Apr 25, 2014
side sockets).  This is reported to fix some oddities after a recent
change in this area.

Somewhat related to Issue #125.

Submitted by:	@i2aaron
bmah888 added a commit that referenced this issue May 1, 2014
@bmah888 bmah888 modified the milestones: 3.0, 3.1 May 30, 2014
@bmah888
Copy link
Contributor Author

bmah888 commented May 30, 2014

@bltierney and I would like to see if we can get this into an upcoming 3.0.x maintenance release.

@bmah888
Copy link
Contributor Author

bmah888 commented May 30, 2014

Testing tip of master on the ESnet 100G testbed (same conditions as 17 April tests)...TCP intervals work well:

[bmah@nersc-diskpt-6 ~]$ ./iperf3 -c 192.168.101.9 
Connecting to host 192.168.101.9, port 5201
[  4] local 192.168.101.8 port 46303 connected to 192.168.101.9 port 5201
[ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
[  4]   0.00-1.00   sec  2.94 MBytes  24.6 Mbits/sec   25   52.4 KBytes       
[  4]   1.00-2.00   sec  2.21 MBytes  18.5 Mbits/sec   32   17.5 KBytes       
[  4]   2.00-3.00   sec  2.75 MBytes  23.0 Mbits/sec   17    114 KBytes       
[  4]   3.00-4.00   sec  3.22 MBytes  27.0 Mbits/sec   39   26.2 KBytes       
[  4]   4.00-5.00   sec  1.85 MBytes  15.5 Mbits/sec   24   17.5 KBytes       
[  4]   5.00-6.00   sec  1.79 MBytes  15.0 Mbits/sec   26   8.74 KBytes       
[  4]   6.00-7.00   sec  4.01 MBytes  33.6 Mbits/sec   33   35.0 KBytes       
[  4]   7.00-8.00   sec   306 KBytes  2.51 Mbits/sec   11   26.2 KBytes       
[  4]   8.00-9.00   sec  2.27 MBytes  19.0 Mbits/sec   20   87.4 KBytes       
[  4]   9.00-10.00  sec  1.71 MBytes  14.3 Mbits/sec   29   35.0 KBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-10.00  sec  23.0 MBytes  19.3 Mbits/sec  256             sender
[  4]   0.00-10.00  sec  23.0 MBytes  19.3 Mbits/sec                  receiver

iperf Done.

UDP tests exhibit no regressions (seen from the receiver):

Accepted connection from 192.168.101.8, port 46304
[  5] local 192.168.101.9 port 5201 connected to 192.168.101.8 port 59423
[ ID] Interval           Transfer     Bandwidth       Jitter    Lost/Total Datagrams
[  5]   0.00-1.00   sec   104 KBytes   852 Kbits/sec  6.374 ms  2/15 (13%)  
[  5]   1.00-2.00   sec   120 KBytes   983 Kbits/sec  2.426 ms  1/16 (6.2%)  
[  5]   2.00-3.00   sec   104 KBytes   852 Kbits/sec  1.055 ms  3/16 (19%)  
[  5]   3.00-4.00   sec   104 KBytes   852 Kbits/sec  0.462 ms  3/16 (19%)  
[  5]   4.00-5.00   sec   120 KBytes   983 Kbits/sec  0.181 ms  1/16 (6.2%)  
[  5]   5.00-6.00   sec   112 KBytes   918 Kbits/sec  0.078 ms  2/16 (12%)  
[  5]   6.00-7.00   sec   112 KBytes   918 Kbits/sec  0.040 ms  2/16 (12%)  
[  5]   7.00-8.00   sec   120 KBytes   983 Kbits/sec  0.023 ms  1/16 (6.2%)  
[  5]   8.00-9.00   sec   120 KBytes   983 Kbits/sec  0.018 ms  1/16 (6.2%)  
[  5]   9.00-10.00  sec   112 KBytes   918 Kbits/sec  0.014 ms  2/16 (12%)  
[  5]  10.00-10.04  sec  0.00 Bytes  0.00 bits/sec  0.014 ms  0/0 (-nan%)  
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth       Jitter    Lost/Total Datagrams
[  5]   0.00-10.04  sec  1.24 MBytes  1.04 Mbits/sec  0.014 ms  18/159 (11%)  

Tested with SCTP as well (results not shown).

This looks good enough to merge.

@bmah888 bmah888 added the merge label May 30, 2014
bmah888 added a commit that referenced this issue May 30, 2014
transfer.

Note that the sender can either be the client or the server depending
on whether --reverse is used.

This fixes some problems with UDP transfers getting severely confused
and (wrongly) complaining about packets arriving out of order.

Related to issue #125.
(cherry picked from commit cbacc1d)

Signed-off-by: Bruce A. Mah <bmah@es.net>
bmah888 added a commit that referenced this issue May 30, 2014
side sockets).  This is reported to fix some oddities after a recent
change in this area.

Somewhat related to Issue #125.

Submitted by:	@i2aaron
(cherry picked from commit 6b16244)

Signed-off-by: Bruce A. Mah <bmah@es.net>
@bmah888
Copy link
Contributor Author

bmah888 commented May 30, 2014

Merge to the 3.0-STABLE branch complete and lightly tested. Closing this issue.

@bmah888 bmah888 closed this as completed May 30, 2014
@bmah888 bmah888 removed the merge label May 30, 2014
bmah888 added a commit that referenced this issue Oct 13, 2014
…able.

We need this to permit a UDP receiving iperf3 server to listen on its
control channel.

The fix for non-blocking sockets basically makes sure that if we do a
read on a non-blocking sockets, but there's no data, the UDP processing
code does *not* try to do the normal protocol packet processing on the
non-existent packet.

This is part of an ongoing fix for issue #212 but also should have been
a part of the fix for issue #125.
bmah888 added a commit that referenced this issue Oct 14, 2014
…able.

We need this to permit a UDP receiving iperf3 server to listen on its
control channel.

The fix for non-blocking sockets basically makes sure that if we do a
read on a non-blocking sockets, but there's no data, the UDP processing
code does *not* try to do the normal protocol packet processing on the
non-existent packet.

This is part of an ongoing fix for issue #212 but also should have been
a part of the fix for issue #125.

(cherry picked from commit 8694d1d)
Signed-off-by: Bruce A. Mah <bmah@es.net>
bmah888 added a commit that referenced this issue May 30, 2017
In some work related to #125, we introduced a bug in which
chunks of a file being read for the -F option were not
completely sent, particularly with TCP sockets.  We attempt
to fix this by detecting cases in which not all data passed
to a socket could be actually sent (for example due to full
socket buffers) and preserving that data for future send
iterations.

The ending statistics in the "diskfile" JSON structure were
wrong, and did not properly distinguish between sender-side
and receiver-side statistics.  This has been fixed (at least
for the client side).

Specifically mention in the manpage that "iperf -F" is not
a file transfer tool.
bmah888 added a commit that referenced this issue May 30, 2017
Attempt to fix some brokenness in -F from #301.

In some work related to #125, we introduced a bug in which
chunks of a file being read for the -F option were not
completely sent, particularly with TCP sockets.  We attempt
to fix this by detecting cases in which not all data passed
to a socket could be actually sent (for example due to full
socket buffers) and preserving that data for future send
iterations.

The ending statistics in the "diskfile" JSON structure were
wrong, and did not properly distinguish between sender-side
and receiver-side statistics.  This has been fixed (at least
for the client side).

Specifically mention in the manpage that "iperf -F" is not
a file transfer tool.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant