probes falling back to proc parsing due to late eBPF events #2650

rade · 2017-06-27T09:12:37Z

We see this:

<probe> ERRO: 2017/06/27 05:02:25.336993 tcp tracer received event with timestamp 615263068049191 even though the last timestamp was 615263068049504. Stopping the eBPF tracker.

after running probes for hours, even days.

The message is produced by the logic introduced in #2334, even though we are running Ubuntu with a 4.4.0-81 kernel, which should not exhibit the problem described in that issue.

@alban any idea how we might go about tracking this down?

Also, as a workaround, is there any downside to simply restarting all the eBPF tracking when getting this error, i.e. going through the same initialisation as when initially starting the probes, including the initial proc walk? We'd need to guard against "flapping", obviously.

The text was updated successfully, but these errors were encountered:

2opremio · 2017-06-27T09:17:23Z

@alban has suggested offline:

if it is reproducible, I would add logs to print the cpu number for both events..

I would try to reproduce an issue with the tcptracer-bpf test program (without scope) in the dev environment with your kernel, using a script like #2507 (comment) with different cpu# in different terminals

2opremio · 2017-07-11T16:22:54Z

Could this be due to iovisor/gobpf#42 ?

alban · 2017-07-12T13:44:00Z

@rade How many cpus do you have on the machine where the bug was produced?

I am currently trying to reproduce this on GCE with this config:

$ uname -r
4.4.0-81-generic
$ grep PRETTY /etc/os-release
PRETTY_NAME="Ubuntu 14.04.5 LTS"
$ cat /sys/devices/system/cpu/{possible,present,online}
0
0
0

rade · 2017-07-12T13:57:47Z

# uname -r
4.4.0-83-generic
# grep PRETTY /etc/os-release
PRETTY_NAME="Ubuntu 16.04.1 LTS"
# cat /sys/devices/system/cpu/{possible,present,online}
0-127
0-15
0-15

rade · 2017-07-12T14:27:00Z

FYI, on our dev cluster of seven machines, all the probes were started on 2017/06/29 21:51. Four of them are still running in ebpf mode now, the other three encountered a timestamp discrepancy at

2017/06/30 07:50:49.966603
2017/07/01 22:42:24.916713
2017/07/04 02:03:31.517207

respectively.

So this error does appear to be quite rare.

alban · 2017-07-12T14:33:08Z

I can reproduce the bug using tcptracer-bpf/tests/tracer in about 1 minute execution with the following:

n1-highcpu-16 (16 vCPUs, 14.4 GB memory)
$ uname -r
4.4.0-81-generic
$ grep PRETTY /etc/os-release
PRETTY_NAME="Ubuntu 14.04.5 LTS"
$ cat /sys/devices/system/cpu/{possible,present,online}
0-15
0-15
0-15

(I added more vCPUs)

I start nginx with sudo docker run -d nginx and run the test script with:

$ cat test/test.sh 
#!/bin/bash
cpu=$1
for i in $(seq 1 10000) ; do
  echo -n "$i "
  for j in $(seq 1 500) ; do
    taskset --cpu-list $cpu wget -O /dev/null http://172.17.0.2 2>/dev/null
  done
done
echo

I open several terminals and run test/test.sh 0, test/test.sh 1,test/test.sh 2 etc. to load several cpus. I can reproduce the bug when there is at least 3 test scripts running in parallel.

rade · 2017-07-12T14:35:00Z

great! should be easy to fix then ;)

alban · 2017-07-17T12:02:35Z

I cannot reproduce this on my laptop but only on GCE with the configuration above. Trying to see what the difference could be, I notice that GCE has a clock skew daemon but disabling it didn't help.

I wrote a program (bpf_clocks) to compare the kernel clock bpf_ktime_get_ns() and the userspace clock clock_gettime(CLOCK_MONOTONIC) since they need to be the same for the ordering to work. I tested a couple of time and both clocks seemed to be the same. I want to automate this test a bit more to run it plenty of times over a couple of minutes.

Next thing to check is if the clock is consistent between cpus or if one cpu can have a different clock than another. I suspect each cpu can give different results because of two comments I read:

"other CPUs are likely to be able observe [time going backward]" in __ktime_get_fast_ns
"There is no promise that the timestamp counters of multiple CPUs on a single motherboard will be synchronized" (source)

I am not sure how to write a test to check if it is the cause of the problem. But that's what I would like to check next.

rade · 2017-07-17T12:15:52Z

@alban do you want me to give you access to one of our clusters where the problem occurs?

alban · 2017-07-18T11:21:43Z

@rade no need for now because I can reproduce the bug without.

Status update: still exploring...

I continued with my bpf_clocks program to compare the kernel and userspace clocks and looked at more instances and I didn't notice discrepancies between the clocks.

I wrote another program to look at the clocks between cpus based on https://github.com/karthick18/ticket_spinlock (with this patch) and I also didn't notice discrepancies.

I am exploring a different possible race: the execution of one eBPF function (such as kprobe__tcp_set_state()) is not instantaneous and there could be some time between the time it takes a timestamp with bpf_ktime_get_ns() and the time the perf event is sent to the ring buffer with bpf_perf_event_output().

With virtual CPUs on platforms like GCE and AWS, there is no guarantees that a CPU is not suspended by the hypervisor, so different execution of eBPF functions can take variable amount of time. We could have the following scenario:

cpu#1 takes a timetamp
cpu#2 takes a timestamp and sends the event with the timestamp to the ring buffer
userspace checks the clock with clock_gettime() and reads the event from cpu#2.
cpu#1 sends the event to the ring buffer
later, userspace receives the event from cpu#1 but it notices its timestamp is before the event from cpu#2 and it cannot reorder it because the event from cpu#2 has already been forwarded to the user since it was before the clock_gettime() barrier.

In order to check if this scenario really happens, I added a new type of event (named type TCP_EVENT_TYPE_CLOCK) emitted after each tcp connect, accept or close (kinvolk-archives/tcptracer-bpf@baa1e79). I am inspecting the logs. I see that between 2 bpf_ktime_get_ns(), there could be between 800ns to 2000ns. It is not that small compared to 2 tcp events (often ~50000ns)

alban · 2017-07-19T10:26:32Z

When looking at the timestamps in the logs, I was not able to observe the race described in my comment above so not sure if it is a correct explanation.

I gave Ubuntu 16.10 with Linux 4.8.0-59-generic a try and the bug is still reproducible there. So it does not seem due to an old kernel.

alban · 2017-07-20T14:06:25Z

I didn't reach a conclusion on the root cause so I will go on with the workaround:

Also, as a workaround, is there any downside to simply restarting all the eBPF tracking when getting this error, i.e. going through the same initialisation as when initially starting the probes, including the initial proc walk? We'd need to guard against "flapping", obviously.

2opremio · 2017-07-20T18:53:24Z

That's quite a big hack but I don't have a better suggestion ...

EbpfTracker can die when the tcp events are received out of order. This can happen with a buggy kernel or apparently in other cases, see: weaveworks#2650 As a workaround, restart EbpfTracker when an event is received out of order. This does not seem to happen often, but as a precaution, EbpfTracker will not restart if the last failure is less than 5 minutes ago. This is not easy to test but I added instrumentation to trigger a restart: - Start Scope with: $ sudo WEAVESCOPE_DOCKER_ARGS="-e SCOPE_DEBUG_BPF=1" ./scope launch - Request a stop with: $ echo stop | sudo tee /proc/$(pidof scope-probe)/root/var/run/scope/debug-bpf

rade added the bug Broken end user or developer functionality; not working as the developers intended it label Jun 27, 2017

2opremio mentioned this issue Jul 11, 2017

elf: perf event ordering has bugs iovisor/gobpf#42

Open

alban mentioned this issue Jul 20, 2017

restart eBPF tracking on error #2735

Merged

4 tasks

rade added this to the 1.6 milestone Jul 24, 2017

rade assigned alban Jul 24, 2017

rade modified the milestones: 1.6, Next Jul 26, 2017

rade closed this as completed in 8fe3538 Aug 18, 2017

rade mentioned this issue Aug 18, 2017

eBPF tracker bounces due to late events #2827

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

probes falling back to proc parsing due to late eBPF events #2650

probes falling back to proc parsing due to late eBPF events #2650

rade commented Jun 27, 2017

2opremio commented Jun 27, 2017

2opremio commented Jul 11, 2017

alban commented Jul 12, 2017

rade commented Jul 12, 2017

rade commented Jul 12, 2017

alban commented Jul 12, 2017

rade commented Jul 12, 2017

alban commented Jul 17, 2017

rade commented Jul 17, 2017

alban commented Jul 18, 2017

alban commented Jul 19, 2017

alban commented Jul 20, 2017

2opremio commented Jul 20, 2017

probes falling back to proc parsing due to late eBPF events #2650

probes falling back to proc parsing due to late eBPF events #2650

Comments

rade commented Jun 27, 2017

2opremio commented Jun 27, 2017

2opremio commented Jul 11, 2017

alban commented Jul 12, 2017

rade commented Jul 12, 2017

rade commented Jul 12, 2017

alban commented Jul 12, 2017

rade commented Jul 12, 2017

alban commented Jul 17, 2017

rade commented Jul 17, 2017

alban commented Jul 18, 2017

alban commented Jul 19, 2017

alban commented Jul 20, 2017

2opremio commented Jul 20, 2017