Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

not enough sequence numbers available! (expire_timeout=10000000000, host_nr=0, ping_count=0, seqmap_next_id=0) #288

Closed
cagney opened this issue Dec 12, 2023 · 9 comments
Labels

Comments

@cagney
Copy link

cagney commented Dec 12, 2023

On FreeBSD (14.0, but also 13.x), fping gets this error:

+# fping  -c 1  --timeout 1s   --src 192.0.1.254 192.0.2.254
+fping error: not enough sequence numbers available! (expire_timeout=10000000000, host_nr=0, ping_count=0, seqmap_next_id=0)

when it is run immediately after a boot (and I really mean immediately, its from a test framework). Adding a sleep 10 before running fping seems to fix the problem.

I suspect:

    /* check if expired (note that unused seqmap values will have fields set to
     * 0, so will be seen as expired */
    next_value = &seqmap_map[seqmap_next_id];
    if (timestamp - next_value->ping_ts < SEQMAP_TIMEOUT_IN_NS) {
        fprintf(stderr, "fping error: not enough sequence numbers available! (expire_timeout=%" PRId64 ", host_nr=%d, ping_count=%d, seqmap_next_id=%d)\n",
            SEQMAP_TIMEOUT_IN_NS, host_nr, ping_count, seqmap_next_id);
        exit(4);
    }

where, immediately after a boot timestamp is small which means timestamp-0/*ping_ts*/ is less than SEQMAP_TIMEOUT_IN_NS i.e., 10s if my math is correct. The fix would be to set .ping_ts to some equivalent of the epoc.

I also suspect #217 was wrong.

@gsnw
Copy link
Contributor

gsnw commented Dec 19, 2023

Unfortunately I cannot reproduce the error, but I think it is related to the jumping time mentioned by hmh in the pull-request, which can occur with CLOCK_REALTIME.
Presumably NTP is used on the system?

The patch is unfortunately not very clean in this respect, but should primarily correct the time output, which it does.
Unfortunately, it seems to generate unwanted subsequent errors in certain constellations.

I'll take a look at the whole thing and look for a better solution. HMH has already mentioned an approach here

and use the CLOCK_MONOTONIC delta + CLOCK_REALTIME timestamp to calculate a more sane real time that doesn't jump around.

Whereby the time output also becomes less accurate at some point with a longer runtime of the system, but the measurement result is not falsified.

@gsnw
Copy link
Contributor

gsnw commented Dec 19, 2023

Can you please check whether the problem is also present with the version under a3d991b ?
Of course, this is not the final version. I just want to know whether the error is gone.

cagney added a commit to cagney/fping that referenced this issue Dec 19, 2023
Change update_current_time() so that current_time is set
from CLOCK_REALTIME and current_time_ms is set from
CLOCK_MONOTONIC.  This way log messages display wall clock
time while any timing calculations use monotonic time.

See schweikert#203

In seqmap_add(), before checking to see if .time_ts wrapped,
check that .time_ts was set.

See schweikert#288
@cagney
Copy link
Author

cagney commented Dec 19, 2023

Unfortunately I cannot reproduce the error,

Part my fault.

It turns out that the just released FreeBSD 14 hasn't updated their fping package (it's still 5.0) so it doesn't include the change for #203. And that change hides the problem with timestamp - next_value->ping_ts < SEQMAP_TIMEOUT_IN_NS because the timestamp is never close to zero.

I added pull request #290 as a possible solution.

gsnw pushed a commit to gsnw/fping that referenced this issue Dec 20, 2023
Change update_current_time() so that current_time is set
from CLOCK_REALTIME and current_time_ms is set from
CLOCK_MONOTONIC.  This way log messages display wall clock
time while any timing calculations use monotonic time.

See schweikert#203

In seqmap_add(), before checking to see if .time_ts wrapped,
check that .time_ts was set.

See schweikert#288
gsnw pushed a commit to gsnw/fping that referenced this issue Dec 21, 2023
Change update_current_time() so that current_time is set
from CLOCK_REALTIME and current_time_ms is set from
CLOCK_MONOTONIC.  This way log messages display wall clock
time while any timing calculations use monotonic time.

See schweikert#203

In seqmap_add(), before checking to see if .time_ts wrapped,
check that .time_ts was set.

See schweikert#288
@auerswal
Copy link
Collaborator

auerswal commented Mar 2, 2024

As a different way to avoid problems with CLOCK_MONOTONIC starting with low values, i.e., below SEQMAP_TIMEOUT_IN_NS, on some operating systems, could we not initialize all ping_ts fields in the seqmap_map to -SEQMAP_TIMEOUT_IN_NS in seqmap_init()?

Of course, that would not address the problem that CLOCK_MONOTONIC is not useful for reporting of "real" time values on OpenBSD, FreeBSD, and macOS. Using CLOCK_REALTIME for, e.g., -D, --timestamp output, and CLOCK_MONOTONIC for time deltas would still be required to make CLOCK_MONOTONIC usable on those operating systems.

@gsnw
Copy link
Contributor

gsnw commented Mar 16, 2024

@auerswal your suggestion is the better solution and should prevent the error message under FreeBSD directly after the system boot.
I have implemented this in pull request #306

@auerswal
Copy link
Collaborator

I think there is an efficiency tradeoff in two possible solutions to the spurious not enough sequence numbers available error:

  1. We can initialize the complete seqmap data structure's ping_ts up front to -SEQMAP_TIMEOUT_IN_NS. This always takes the same amount of extra work independent of the number of values later stored in the data structure while fping runs.
  2. We can add a != 0 test to the code checking if the next seqmap entry can be used, as proposed by @cagney as part of pull request #290. Initially, before all seqmap entries have been used, the != 0 check suffices, and one arithmetic operation is theoretically avoided. After all 65Ki seqmap entries have been used, this extra check always fails. For short fping runs using only a few pings, this is probably more efficient than initializing the whole data structure with a complicated value (i.e., calloc() does not suffice). With super-scalar out-of-order CPUs, both checks may even run in parallel, making the extra != 0 test practically free.

Both solutions avoid spurious not enough sequence numbers available errors, and the second one might be more efficient, but I am fine with both approaches.

@gsnw
Copy link
Contributor

gsnw commented Mar 17, 2024

That's right, here are the different CPU usage times

  1. [DEBUG] CPU time used: 0.001433 sec
  2. [DEBUG] CPU time used: 0.000466 sec

The debug output has been extended with the following commit gsnw@2b588c1

@gsnw
Copy link
Contributor

gsnw commented Mar 20, 2024

@schweikert you can use one of the pull requests #306 or #307.
Preferably use #307, because the other one is a bit worse in terms of CPU time.
I will close the remaining pull request afterwards.

@gsnw
Copy link
Contributor

gsnw commented Apr 19, 2024

@schweikert The issue can be closed as solved

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants