-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use golang.org/x/crypto/cryptobyte to unpack DNS records #1507
base: master
Are you sure you want to change the base?
Conversation
This substantially simplifies the machinery and state tracking around DNS record unpacking. Gone are the `off int` arguments! Gone are the manual slice length checks! Care has been taken to maintain existing behaviour, including quirks, with the only notable intentional change being to UnpackDomainName where we now strictly enforce that pointers point backwards into the message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
holy moly this cleans stuff up.
one comment regarding some docs
Instead of using a custom type we're just going to pass the full msg byte-slice down the call stack. This makes this substantially simpler and allows for some nice optimisations and cleanups.
This unpacking is more strict than previous. It now validates that no trailing data exists.
You know I thought it would, hence the PR, but even I was surprised just how much it improves things. |
This is ugly, but there doesn't really seem to be a way to get around that. The previous code was visually elegant but allocated far too much. This code is efficient but visually ugly. *sigh* I'm not sure if `msg = &b` causing b to escape is a missed optimisation in the compiler or not, but with go1.21 that's how it is. Also adjust our end-of-name check slightly. name old time/op new time/op delta UnpackDomainName-12 152ns ± 6% 148ns ±13% ~ (p=0.353 n=10+10) UnpackDomainNameUnprintable-12 120ns ±12% 122ns ± 9% ~ (p=0.616 n=10+10) UnpackDomainNameLongest-12 618ns ±11% 558ns ± 9% -9.83% (p=0.000 n=10+9) UnpackDomainNameLongestUnprintable-12 1.60µs ±16% 1.55µs ±13% ~ (p=0.447 n=10+9) UnpackDomainNamePointers-12 302ns ±17% 149ns ± 6% -50.51% (p=0.000 n=10+10) name old alloc/op new alloc/op delta UnpackDomainName-12 64.0B ± 0% 64.0B ± 0% ~ (all equal) UnpackDomainNameUnprintable-12 48.0B ± 0% 48.0B ± 0% ~ (all equal) UnpackDomainNameLongest-12 256B ± 0% 256B ± 0% ~ (all equal) UnpackDomainNameLongestUnprintable-12 1.02kB ± 0% 1.02kB ± 0% ~ (all equal) UnpackDomainNamePointers-12 120B ± 0% 48B ± 0% -60.00% (p=0.000 n=10+10) name old allocs/op new allocs/op delta UnpackDomainName-12 1.00 ± 0% 1.00 ± 0% ~ (all equal) UnpackDomainNameUnprintable-12 1.00 ± 0% 1.00 ± 0% ~ (all equal) UnpackDomainNameLongest-12 1.00 ± 0% 1.00 ± 0% ~ (all equal) UnpackDomainNameLongestUnprintable-12 1.00 ± 0% 1.00 ± 0% ~ (all equal) UnpackDomainNamePointers-12 4.00 ± 0% 1.00 ± 0% -75.00% (p=0.000 n=10+10)
We keep some of the existing weird behaviour (mostly of not erroring when we should), but refactor it so it's far more obvious and things are cleaner. We leave a number of TODOs to address some of the weirder choices. They would also allow for some other improvements to the code if we can make those changes.
This was to maintain existing undocumented behaviour, but this makes more sense.
This logic in this test case was wrong which caused strange behaviour when re-working the domain name unpacking code.
This is the more correct error to be returning.
Actually no matter what I seem to do this test is terribly broken. I can't be bothered wrestling with it any longer. Goodbye.
Now that TestTruncatedMsg is gone, we can implement the sane behaviour here.
In some places this results in more readable code, in other places less. What we gain though is the same number of allocations that we started with. In other words, we haven't made things worse! Without this change, every time we pass *cryptobyte.String into an interface it is forced onto the heap and we get a nasty allocation. It does require us to check that all the data has been consumed at the end of each `unpack([]byte) error` call instead of having a single check. Ultimately this isn't too bad. We could change unpack to return whatever is left over. We don't do this for semantic reasons. unpack is required to consume the entirety of data (as we know it's length ahead of time). Asking it to return whatever is left for the sole purpose of checking it's nothing is silly. In some places we pass a pointer or do return the rest. In these cases, it's because we don't know ahead of time how long something will be.
These never actually did anything as dh is a struct and we don't consult it again.
While it's nice having this as a single function rather than one for Question and one for RR, using generics and passing the unpack function causes the *cryptobyte.String to escape. That leaves us with some rather ugly workarounds. Instead just bite the bullet and write it twice. If we chose to remove the msg.Empty() check in the future, then this doesn't end up too bad at all.
This helper validates that the offset can be calculated correctly. We have to restrict msgBuf in unpackRRWithHeader as previously we would sometimes computing invalid offsets in unpackDomainName. This helper is designed to ensure we don't make this mistake again.
We need to compare against the start of the pointer not the end of it. While we're here, improve TestUnpackDomainName and ensure it's testing UnpackDomainName properly.
This isn't needed now that we always check s.Empty().
Returning &h (*RR_Header) as an RR causes h to spill to the heap and allocate. Given this is only needed for a specific error and not in the happy path, we instead create a local copy of h and return a pointer to that instead. That eliminates the allocation on success. While we're here, remove the new(RR_Header) return in UnpackRR. This doesn't make any sense as it contains no information. We already return nil in the error case above. It shows a nice improvement for several of the benchmarks. name old time/op new time/op delta UnpackDomainName-12 153ns ±14% 148ns ± 6% ~ (p=0.268 n=10+9) UnpackDomainNameUnprintable-12 124ns ± 4% 123ns ± 6% ~ (p=0.591 n=10+10) UnpackDomainNameLongest-12 581ns ±12% 561ns ±15% ~ (p=0.190 n=10+10) UnpackDomainNameLongestUnprintable-12 1.76µs ±24% 1.66µs ±20% ~ (p=0.393 n=10+10) UnpackDomainNamePointers-12 146ns ± 5% 146ns ± 3% ~ (p=0.730 n=9+9) UnpackA-12 194ns ±15% 135ns ±12% -30.14% (p=0.000 n=10+10) UnpackMX-12 200ns ±15% 140ns ±10% -29.96% (p=0.000 n=10+10) UnpackAAAA-12 199ns ±15% 153ns ±14% -22.88% (p=0.000 n=10+10) UnpackMsg-12 1.48µs ±16% 1.26µs ±13% -14.68% (p=0.001 n=10+10) UnpackString/Escaped-12 84.8ns ± 7% 79.6ns ± 6% -6.12% (p=0.003 n=9+10) UnpackString/Unescaped-12 46.5ns ±15% 47.8ns ±14% ~ (p=0.353 n=10+10) Serve-12 58.3µs ± 2% 57.8µs ± 3% ~ (p=0.065 n=9+10) Serve6-12 61.4µs ±10% 61.1µs ± 8% ~ (p=0.796 n=10+10) ServeCompress-12 59.1µs ± 2% 58.6µs ± 2% ~ (p=0.113 n=9+10) name old alloc/op new alloc/op delta UnpackDomainName-12 64.0B ± 0% 64.0B ± 0% ~ (all equal) UnpackDomainNameUnprintable-12 48.0B ± 0% 48.0B ± 0% ~ (all equal) UnpackDomainNameLongest-12 256B ± 0% 256B ± 0% ~ (all equal) UnpackDomainNameLongestUnprintable-12 1.02kB ± 0% 1.02kB ± 0% ~ (all equal) UnpackDomainNamePointers-12 48.0B ± 0% 48.0B ± 0% ~ (all equal) UnpackA-12 100B ± 0% 68B ± 0% -32.00% (p=0.000 n=10+10) UnpackMX-12 100B ± 0% 68B ± 0% -32.00% (p=0.000 n=10+10) UnpackAAAA-12 112B ± 0% 80B ± 0% -28.57% (p=0.000 n=10+10) UnpackMsg-12 584B ± 0% 520B ± 0% -10.96% (p=0.000 n=10+10) UnpackString/Escaped-12 32.0B ± 0% 32.0B ± 0% ~ (all equal) UnpackString/Unescaped-12 24.0B ± 0% 24.0B ± 0% ~ (all equal) Serve-12 3.53kB ± 0% 3.50kB ± 0% -0.91% (p=0.000 n=10+10) Serve6-12 3.37kB ± 0% 3.34kB ± 0% -0.95% (p=0.000 n=10+8) ServeCompress-12 3.74kB ± 0% 3.71kB ± 0% -0.87% (p=0.000 n=10+10) name old allocs/op new allocs/op delta UnpackDomainName-12 1.00 ± 0% 1.00 ± 0% ~ (all equal) UnpackDomainNameUnprintable-12 1.00 ± 0% 1.00 ± 0% ~ (all equal) UnpackDomainNameLongest-12 1.00 ± 0% 1.00 ± 0% ~ (all equal) UnpackDomainNameLongestUnprintable-12 1.00 ± 0% 1.00 ± 0% ~ (all equal) UnpackDomainNamePointers-12 1.00 ± 0% 1.00 ± 0% ~ (all equal) UnpackA-12 3.00 ± 0% 2.00 ± 0% -33.33% (p=0.000 n=10+10) UnpackMX-12 3.00 ± 0% 2.00 ± 0% -33.33% (p=0.000 n=10+10) UnpackAAAA-12 3.00 ± 0% 2.00 ± 0% -33.33% (p=0.000 n=10+10) UnpackMsg-12 12.0 ± 0% 10.0 ± 0% -16.67% (p=0.000 n=10+10) UnpackString/Escaped-12 1.00 ± 0% 1.00 ± 0% ~ (all equal) UnpackString/Unescaped-12 1.00 ± 0% 1.00 ± 0% ~ (all equal) Serve-12 55.0 ± 0% 54.0 ± 0% -1.82% (p=0.000 n=10+10) Serve6-12 52.0 ± 0% 51.0 ± 0% -1.92% (p=0.000 n=10+10) ServeCompress-12 57.0 ± 0% 56.0 ± 0% -1.75% (p=0.000 n=10+10)
…message Semantically this makes more sense. We also rename a few variables here and there to be more consistent.
Also remove some pointless `err != nil` checks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be good to run the coredns testsuite with this version.
Yep that's actually my hold up at the moment. I've been debugging and working through a few scattered CoreDNS test failures. |
This fixes a CoreDNS test failure (github.com/coredns/coredns/plugin/pkg/proxy.TestCoreDNSOverflow) that was testing behaviour that is sensitive to the specific error message returned when unpacking overflowing messages or that it is ErrBuf. It also ensures we correctly distinguish between invalid record data and invalid message framing. Ideally the `unpack(...) error` methods (on RR, EDNS0 and SVCBKeyValue) would actually be `unpack(...) bool` and we'd allow ourselves more freedom on how we report unpacking failures.
In a small number of instances we lose some error information with this change. I don't think it's particularly valuable information and this approach is far simpler, far less vague about what errors are appropriate and match how crypto/tls uses cryptobyte.String to unmarshal handshake messages.
This code only ever returns a single error value.
Yeah I've gone back and forth on the errors and ended up with something quite different. There may be compatibility concerns, but ultimately I think we need to be far more semantic with our unpacking errors. I've gone and distinguished between failures to unpack the message framing and failures to unpack the record data specifically. Once we're in a length-delimited section, it doesn't make sense to use the same error anymore as we're not dealing with potential truncation but rather just corrupt marshalling. Edit: The error changes very much could be backward incompatible. It's hard to tell exactly. If so, I think we can work around that while still mostly keeping the semantic distinctions. |
I've left quite a few TODOs and comments where I've discovered incorrect or problematic behaviour. I've successfully run the CoreDNS test suite against this change now as well. Figuring out why it was failing was quite difficult and led down the RFC rabbit hole and exposed some bugs in this library. I'll aim to address all the TODOs in future PRs. |
// errors.Is(_, ErrBuf) or contains the substring "overflow". | ||
// | ||
// TODO(tmthrgd): Update CoreDNS and use "truncated message". | ||
errTruncatedMessage = &Error{err: "overflow unpacking truncated message"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a properly truncated message should unpack just fine.
unsure why TC bit messages deserve special treatment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't for a properly framed messaged with or without the TC bit set, this is for a message that is say randomly cut in half. We return this if we're say unpacking a resource, but the message buffer stops half way through the RDATA. This already fails, but with a less obvious error message. I'm open to suggestions on a different message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, ok. Essentially ErrBuf, but... maybe use partial as truncated as meaning in the protocol?
ErrPartialMessage = &Error{err: "overflow unpacking partial message"}
(unsure about the word "overflow" here.
// | ||
// TODO(tmthrgd): This check is no longer required to ensure | ||
// loop free behaviour. Should we remove it? | ||
if ptrs++; ptrs > maxCompressionPointers { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
technically there could still be a lot of pointers backwards? maxCompressionsPointers
now has a value of 126; that could still be reach, although not via a loop.
Think it can go, even with a mesg with a lot of pointers backwards, we should not introduce an arbitrary limit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah it's a weird one. Pointing to another pointer would never be generated by a sensible non-buggy implementation. It doesn't even save you a byte. There's annoyingly no RFC that prohibits it though, so it is very much a judgement call. The backward check prevents a loop so we can remove this, it just opens the door to being forced to do more work by an attacker. I can remove it in a follow up PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pointer to self is then still checked?
think this could go in and the fix some fallout and the dyn. update stuff? (pointer to self should still be checked I guess, but can also be done in followup) |
This substantially simplifies the machinery and state tracking around DNS record unpacking. Gone are the
off int
arguments! Gone are the manual slice length checks!For the most part care has been taken to maintain existing behaviour, including quirks, with one of the few notable intentional change being to
UnpackDomainName
where we now strictly enforce that pointers point backwards into the message. We also fix a bug here or there (seeunpackQuestion
) and are more strict in our unpacking of certain EDNS0 options.