Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpf,test: Fix verifier issues in IPv6 BPF tests when running locally #25191

Merged

Conversation

dylandreimerink
Copy link
Member

@dylandreimerink dylandreimerink commented Apr 28, 2023

When running the TestBPF/ipv6_test.o test locally with clang v13 and kernel v5.17 it resulted in the following verifier error:

171: (b7) r2 = 1                      ; R2_w=inv1
; authhdr->seq = 1;
172: (63) *(u32 *)(r1 +8) = r2
invalid access to packet, off=8 size=4, R1(id=6,off=8,r=2)
R1 offset is outside of the packet

Tracked this down to ipv6_with_hop_auth_tcp_pktgen. It gets a pointer
from pktgen__append_ipv6_extension_header. All expected bounds checks
were in place. It even has a function called ctx_data_valid with some
inline assembly which is supposed to perform additional bounds checks.

Was able to fix this by replacing ctx_data_valid with a normal check
and adding a bounds check to ipv6_with_hop_auth_tcp_pktgen. This
bounds check seems to be necessary because bounds checks do not seem
to apply to copies.

The verifier log reveals the following:

85: (69) r2 = *(u16 *)(r10 -40)       ; R2_w=invP14 R10=fp0
; builder->layer_offsets[layer_idx] = builder->cur_off;
86: (6b) *(u16 *)(r4 +10) = r2        ; R2_w=invP14 R4_w=fp-44 fp-40=mmmmmmmm
; layer = ctx_data(ctx) + builder->cur_off;
87: (0f) r1 += r7                     ; R1_w=pkt(id=0,off=14,r=54,imm=0) R7=invP14
; builder->cur_off += hdrsize;
88: (69) r2 = *(u16 *)(r10 -40)       ; R2_w=inv(id=0,umax_value=65535,var_off=(0x0; 0xffff)) R10=fp0

At instruction 85 we know the offset is 14 exactly. It is then written
back to the stack. And at 88 we have lost bounds data. Since this is
an offset we use to construct the pointers, this uncertainty becomes
part of all pointers causing the need for additional checks.

To mitigate the result of the above apparent verifier bug/limitation,
I have also changes our offset value from 16 bit to 64 bit which tracks
better.

Fix verifier issues in IPv6 BPF tests

@dylandreimerink dylandreimerink added the release-note/ci This PR makes changes to the CI. label Apr 28, 2023
@dylandreimerink dylandreimerink requested a review from a team as a code owner April 28, 2023 10:29
Copy link
Contributor

@gentoo-root gentoo-root left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was able to fix this by replacing ctx_data_valid with a normal check
and adding a bounds check to ipv6_with_hop_auth_tcp_pktgen.

It still failed the verifier in the CI...

bounds checks on one copy somehow do not propagate to the other copies.

That would be a kernel bug. Do you plan to do some research on that?

@dylandreimerink
Copy link
Member Author

It still failed the verifier in the CI...

Well, shit. More fun, will take a look.

bounds checks on one copy somehow do not propagate to the other copies.

That would be a kernel bug. Do you plan to do some research on that?

It is even more interesting. r10 - 40 is equal to 14 exactly, but then we read it as u16 and our offset variable loses its info. become min=0 max=65535. And since we use this offset + ctx->data to create all pointers, all pointers are now hugely variable until additional bounds checks.

85: (69) r2 = *(u16 *)(r10 -40)       ; R2_w=invP14 R10=fp0
; builder->layer_offsets[layer_idx] = builder->cur_off;
86: (6b) *(u16 *)(r4 +10) = r2        ; R2_w=invP14 R4_w=fp-44 fp-40=mmmmmmmm
; layer = ctx_data(ctx) + builder->cur_off;
87: (0f) r1 += r7                     ; R1_w=pkt(id=0,off=14,r=54,imm=0) R7=invP14
; builder->cur_off += hdrsize;
88: (69) r2 = *(u16 *)(r10 -40)       ; R2_w=inv(id=0,umax_value=65535,var_off=(0x0; 0xffff)) R10=fp0

So this could indeed be something that we could address in the verifier, if this has not been fixed already.

When running the `TestBPF/ipv6_test.o` test locally with clang v13 and
kernel v5.17 it resulted in the following verifier error:

```
171: (b7) r2 = 1                      ; R2_w=inv1
; authhdr->seq = 1;
172: (63) *(u32 *)(r1 +8) = r2
invalid access to packet, off=8 size=4, R1(id=6,off=8,r=2)
R1 offset is outside of the packet
```

Tracked this down to `ipv6_with_hop_auth_tcp_pktgen`. It gets a pointer
from `pktgen__append_ipv6_extension_header`. All expected bounds checks
were in place. It even has a function called `ctx_data_valid` with some
inline assembly which is supposed to perform additional bounds checks.

Was able to fix this by replacing `ctx_data_valid` with a normal check
and adding a bounds check to `ipv6_with_hop_auth_tcp_pktgen`. This
bounds check seems to be necessary because bounds checks do not seem
to apply to copies.

The verifier log reveals the following:

```
85: (69) r2 = *(u16 *)(r10 -40)       ; R2_w=invP14 R10=fp0
; builder->layer_offsets[layer_idx] = builder->cur_off;
86: (6b) *(u16 *)(r4 +10) = r2        ; R2_w=invP14 R4_w=fp-44 fp-40=mmmmmmmm
; layer = ctx_data(ctx) + builder->cur_off;
87: (0f) r1 += r7                     ; R1_w=pkt(id=0,off=14,r=54,imm=0) R7=invP14
; builder->cur_off += hdrsize;
88: (69) r2 = *(u16 *)(r10 -40)       ; R2_w=inv(id=0,umax_value=65535,var_off=(0x0; 0xffff)) R10=fp0
```

At instruction 85 we know the offset is `14` exactly. It is then written
back to the stack. And at 88 we have lost bounds data. Since this is
an offset we use to construct the pointers, this uncertainty becomes
part of all pointers causing the need for additional checks.

To mitigate the result of the above apparent verifier bug/limitation,
I have also changes our offset value from 16 bit to 64 bit which tracks
better.

Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com>
@dylandreimerink dylandreimerink force-pushed the feature/ipv6-test-out-of-bounds-fix branch from 0ba208a to 686b2c5 Compare April 28, 2023 12:35
@gentoo-root
Copy link
Contributor

It is even more interesting. r10 - 40 is equal to 14 exactly, but then we read it as u16 and our offset variable loses its info. become min=0 max=65535.

This issue looks like it's already in my TODO list. If the spill was 64-bit, the subsequent 32-bit or 16-bit fill will lose boundaries: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/bpf/verifier.c?id=6715df8d5d24655b9fd368e904028112b54c7de1#n3835

@dylandreimerink
Copy link
Member Author

I fixed CI by using __u64 offsets everywhere so the verifier has an easier time keeping track these values.

@dylandreimerink
Copy link
Member Author

dylandreimerink commented Apr 28, 2023

/test

Job 'Cilium-PR-K8s-1.25-kernel-4.19' failed:

Click to show.

Test Name

K8sDatapathServicesTest Checks E/W loadbalancing (ClusterIP, NodePort from inside cluster, etc) with L4 policy Tests NodePort with L4 Policy

Failure Output

FAIL: Request from testclient-knwc8 pod to service tftp://[fd04::11]:30482/hello failed

Jenkins URL: https://jenkins.cilium.io/job/Cilium-PR-K8s-1.25-kernel-4.19/1909/

If it is a flake and a GitHub issue doesn't already exist to track it, comment /mlh new-flake Cilium-PR-K8s-1.25-kernel-4.19 so I can create one.

Then please upload the Jenkins artifacts to that issue.

@dylandreimerink
Copy link
Member Author

We appear to be hitting flake #24961

@dylandreimerink
Copy link
Member Author

/test-1.25-4.19

@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Apr 28, 2023
@pchaigno pchaigno merged commit ceaa4c4 into cilium:main May 1, 2023
@pchaigno pchaigno added needs-backport/1.12 backport/author The backport will be carried out by the author of the PR. labels Jul 28, 2023
@pchaigno
Copy link
Member

pchaigno commented Jul 28, 2023

Marking for backports to v1.12 and v1.13 as otherwise #25699 fails there with the above verifier error.

EDIT: Removing v1.12 as the BPF unit tests are too different there to be able to easily backport.

@pchaigno pchaigno mentioned this pull request Jul 28, 2023
4 tasks
@julianwiedmann julianwiedmann added backport-done/1.13 The backport for Cilium 1.13.x for this PR is done. and removed backport-pending/1.13 labels Aug 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/author The backport will be carried out by the author of the PR. backport-done/1.13 The backport for Cilium 1.13.x for this PR is done. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/ci This PR makes changes to the CI.
Projects
No open projects
Status: Released
Development

Successfully merging this pull request may close these issues.

4 participants