Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse IPv6 extension headers #51

Merged
merged 1 commit into from
Jul 18, 2024

Conversation

uablrek
Copy link
Contributor

@uablrek uablrek commented Jul 5, 2024

IPv6 extension headers must be parsed, at least fragment headers.

Fixes: #15

@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 5, 2024
@k8s-ci-robot k8s-ci-robot requested a review from danwinship July 5, 2024 13:54
@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jul 5, 2024
@k8s-ci-robot k8s-ci-robot requested a review from thockin July 5, 2024 13:54
@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jul 5, 2024
@uablrek
Copy link
Contributor Author

uablrek commented Jul 5, 2024

A unit test is added, that currently fails, but high-lite the fragmented IPv6-UPD problem.

@uablrek
Copy link
Contributor Author

uablrek commented Jul 5, 2024

The extension packets are assumed to be in the recommended order, i.e. a fragment header is the last one, if it exist.

The IPPROTO_HOPOPTS, IPPROTO_ROUTING and IPPROTO_DSTOPTS extension headers are not tested. It is non-trivial to generate these header for testing.

@uablrek
Copy link
Contributor Author

uablrek commented Jul 5, 2024

/cc @aojea

@k8s-ci-robot k8s-ci-robot requested a review from aojea July 5, 2024 15:12
@uablrek
Copy link
Contributor Author

uablrek commented Jul 5, 2024

The capture-chunk is still 128 byte. I think that the fragment case is the most important, and that doesn't require more

@uablrek uablrek marked this pull request as ready for review July 5, 2024 15:15
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 5, 2024
@uablrek
Copy link
Contributor Author

uablrek commented Jul 6, 2024

There may be problems also with IPv4 fragments. I think they will be handled as un-fragmented packets, but the L4-header is missing, so garbage data will be used. I will test it and rename this PR to something like "Parse IPv6 extension headers and handle fragments", if necessary.

I make this PR a "draft" until things are sorted out

@uablrek uablrek marked this pull request as draft July 6, 2024 06:00
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 6, 2024
@uablrek
Copy link
Contributor Author

uablrek commented Jul 6, 2024

FYI https://github.com/uablrek/pcap2go is used to create packet test-data

}
}

var packetsUdpFragIPv6 = [][]byte{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a comment explaining how this is generated so future us can create more of these payloads

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment added in packet_test.go, and pcap files added in test-data/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better add it under the same folder pkg/networkpolicy/test-data/ and add a small README.md doc in the folder explaining https://github.com/uablrek/pcap2go

@uablrek
Copy link
Contributor Author

uablrek commented Jul 6, 2024

There may be problems also with IPv4 fragments.

Probably not, because when conntrack is used (which it always is in K8s), IPv4 packets are reassembled. (BTW the answer is very informative)

@uablrek
Copy link
Contributor Author

uablrek commented Jul 6, 2024

Considering the above 👆, shall we assume that no IPv4 fragments are ever delivered to user-space, or should we be prepared for the very unlikely event?

@aojea
Copy link
Contributor

aojea commented Jul 6, 2024

Considering the above 👆, shall we assume that no IPv4 fragments are ever delivered to user-space, or should we be prepared for the very unlikely event?

seems reasonable, but leave a comment with the link to that great explanation to document the reason

@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jul 7, 2024
@uablrek
Copy link
Contributor Author

uablrek commented Jul 7, 2024

I added checks for packets with incomplete headers and corrupted packets (header lengths too small). But the kernel will verify this and discard those packets. If the capture length is changed 128->1280, all those checks should be redundant.

@aojea Do you think I should set capture length 1280, and remove all too-small and corrupted tests?

(never mind the lint-errors, I'll fix them later)

@uablrek
Copy link
Contributor Author

uablrek commented Jul 7, 2024

IPPROTO_DSTOPTS may appear again after IPPROTO_FRAGMENT in the recommended order. I will fix that when I find a way to test it.

// in the nfqueue
const captureChunk = 128

func TestUdpFragmentIPv6(t *testing.T) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"UDP" not "Udp". Golang style doesn't force acronyms to "titlecase" (though they can be all-lowercase if needed).

(Likewise everywhere.)

}{
{
name: "UDP first fragment",
input: packetsUdpFragIPv6[0][14:(14 + captureChunk)],

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

14 should not be a magic number

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

}
}

var packetsUdpFragIPv6 = [][]byte{

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like this would make more sense as three separate variables with useful names.
also, maybe split into L2 header, L3 header, payload, with some minimal comments?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree. If packet test data is generated from pcap, it should not be post-processed. That would make it an exquisite pain to create new test data (exactly what I wanted to avoid by using a capture).

On the other hand, if test-data is manufactured by hand, there should of course be comments

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saved the original pcap files in test-data/ so people can analyze it with wireshark for instance

},
},
{
name: "PSH, ACK",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should only be receiving packets for ctstate new connections...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, but IMHO parsePacket() should be more-or-less generic. It should not panic if some TCP packet arrives that is not a SYN for instance. I think "test too much" here should be ok.

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jul 9, 2024
@uablrek
Copy link
Contributor Author

uablrek commented Jul 9, 2024

TL;DR; SRv6 is a horrible edge use-case!

In my quest for IPv6 packets with extension headers I revisited SRv6. I tested it after viewing an excellent presentation by Daniel Bernier, Bell Canada at KubeCon in Valencia 2022.

With SRv6 there is a routing header, followed by another IPv6 header, and then the TCP header:

tcp-syn-segment-routing

This will probably never happen, but network-policies is in the forwarding plane, so it might be possible so see something like this.

As I see it we have two choices:

  1. Parse this and return addresses from the second IPv6 header, and detect proto TCP
  2. Reject as "not supported"

The second can be "reject any packet with a routing header".

@uablrek uablrek mentioned this pull request Jul 9, 2024
@aojea
Copy link
Contributor

aojea commented Jul 9, 2024

@aojea Do you think I should set capture length 1280, and remove all too-small and corrupted tests?

I prefer to make the code more resilient than depending on the kernel to make the right thing

There are some linter errors and some job failing that I do not know if is a flake, but this should be ready for review

@uablrek
Copy link
Contributor Author

uablrek commented Jul 9, 2024

I made the PR "ready for review". IPv6 fragments are handled, but there are some "exotic" cases that should also be covered, like SRv6, and repeated destopts. Perhaps extension headers that are not in the recommended order, but I rather reject those. Also unit-tests for extension headers (except fragments) are missing.

But since IPv6 fragment handling is most urgent, and may be considered a bug, the remaining stuff can be added in a later kind/feature PR. I'll be happy to continue working with that.

@uablrek uablrek marked this pull request as ready for review July 9, 2024 12:47
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 9, 2024
@k8s-ci-robot k8s-ci-robot requested a review from aojea July 9, 2024 12:47
}

var dataOffset int
havePorts := true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when is this false? preparing for icmp?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

} else {
// Not first fragment, we have no L4 header and the
// payload begins after this header
l4offset = -1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is protocol = nxtHeader correct in this case? it will be syscall.IPPROTO_FRAGMENT

l4offset = -1. does not seem to be handled later, should this return here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is protocol = nxtHeader correct in this case? it will be syscall.IPPROTO_FRAGMENT

It's intentional, but strictly not correct since there is an L4 protocol in the nxtHeader field. But there is no L4 header! The fragment data begins immediately after the fragment header. So to identify the packet as a fragment, syscall.IPPROTO_FRAGMENT is kept in the protocol variable. I'll make a comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

l4offset = -1. does not seem to be handled later, should this return here?

It's probably a just-in-case left-over. I'll check it.

@uablrek
Copy link
Contributor Author

uablrek commented Jul 17, 2024

Updated after comments.

I figured that we can return an empty packet.proto for fragments, and for any other packets we want to accept. The default is however still to return an error for non tcp/udp/sctp. I was uncertain about this, since it requires that only tcp/udp/sctp packets are routed to the nfqueue.

The alternative is to return an empty packet.proto for all packets we want to accept, not only fragments. If we do that, no "allow unknown protocols" pr is needed. The control is in parsePacket().

I have prepared a commit that rejects IPPROTO_ROUTING headers, and handle the remaining case for headers in the recommended order: extra IPPROTO_DSTOPTS headers after the first fragment header. But it needs rebase after these updates. I also add a ipproto field in the packet struct. Not really needed, but "good to have" for logging and future extensions.

@uablrek
Copy link
Contributor Author

uablrek commented Jul 17, 2024

I am still working on tests, both unit-test and real traffic tests, with IPv6 extension headers. I have not figured out an easy way, but am planning to use a tap device that adds some IPv6 extension headers. I.e. packets are routed to a tap device, a user-space program add some headers, and send the packets back. I have done it before, and got quite far. IMO tap devices, same as nfqueue, are often overlooked for feature implementations and testing. A hyped alternative is eBPF, but that is much more complicated, and should be avoided unless performance is really needed.

A good example of the power of tap devices is tayga NAT64 (rock-solid since 2011!)

@uablrek
Copy link
Contributor Author

uablrek commented Jul 17, 2024

I discovered that lksctp (Linux kernel SCTP) doesn't fragment on IP-level (this was new to me). The MTU of the outgoing device is examined, and fragmentation is made on SCTP level. This means that IPv6 fragments will never be used for lksctp. I don't know about other sctp implementations, but I will only test lksctp.

Comment on lines 434 to 435
c.nfq.SetVerdict(*a.PacketID, nfqueue.NfAccept) //nolint:errcheck
return 0
Copy link
Contributor

@aojea aojea Jul 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dan and I were discussing offline, and we concluded that:

  1. network policies only care about L4 protocol if you set the Ports fields
  2. if not Port field is specified , is legit to assume that the user/admin can consider the network policy as "any" protocol (some from sftim about users wanting to control ICMP is spot on)

As conclusion, I think we set the packet.proto to empty , then srcPort and dstPort will be 0, and IIUIC the network policy logic will be able do the right thing.

This leaves the parsePacket() errors to short or malformed packets (module bugs) than we can say we deny by default , as we don't expect at this point to receive a packet from the kernel that is malformed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you just remove this switch case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And just check for an empty v1.Protocol? What is the best if-condition:

  if string(packet.proto) == ""

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I couldn't figure that out, hence the switch)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, not check at all , just pass the packet as is to the next stage

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

c.evaluatePacket(packet) will evaluate the packet only with the IPs information

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pushed

Comment on lines 125 to 127
// This "should not happen" since only { tcp, udp, sctp } are
// routed to the nfqueue
return t, fmt.Errorf("unknown protocol %d", protocol)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are going to route any protocol here, so we can emit verdict on icmp per example, more on this in https://github.com/kubernetes-sigs/kube-network-policies/pull/51/files#r1681040192

@uablrek uablrek force-pushed the packet-parse branch 2 times, most recently from 3c6b0ad to 59745d1 Compare July 18, 2024 18:21
Comment on lines 129 to 138
if payloadOffset > len(b) {
// We must have a complete L4 header, however payload may be zero-lengh
return t, ErrorTooShort
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this invalid? why this should be an error ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, ok, if payloadOffset is equal to len(b) is a valid packet

// Obtain the offset of the payload
// TODO allow to filter by the payload
if len(b) >= hdrlen+dataOffset {
t.payload = b[hdrlen+dataOffset:]
if payloadOffset > 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this happen? I don't know if I prefer the old check

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dataOffset is a tcp-header thing. It should be used to compute the hdrlen for tcp-headers only. I think there was a bug in the old code for sctp and udp payloads. I looked that the hdrlen was added twice, since it was included in dataOffset (but I didn't really check carefully, since I re-wrote it)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, payloadOffset must be >0 now since we return in the default: branch. I can remove the check.

And add parsePacket unit test
@aojea
Copy link
Contributor

aojea commented Jul 18, 2024

/lgtm
/approve

Thanks

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 18, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: aojea, uablrek

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 18, 2024
@k8s-ci-robot k8s-ci-robot merged commit 17360aa into kubernetes-sigs:main Jul 18, 2024
9 checks passed
@uablrek uablrek deleted the packet-parse branch July 19, 2024 02:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Header processing is too simplified for IPv6
4 participants