-
Notifications
You must be signed in to change notification settings - Fork 12.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
std::net::Ipv4Addr parsing violates the "strict" form in IETF RFC 6943 Section 3.1.1. #86964
Comments
I believe that the second input ( The example in the RFC for an octal number is If you read the source code where the ipv4 address is parsed, it specifically disallows that: rust/library/std/src/net/parser.rs Lines 141 to 146 in d2b04f0
Regarding |
From the release note of 1.53:
|
I'm confused by these comments. Padded zeros is not unique to the base-8 numeral system. Base-2, Base-3, etc. all have the property of having an infinite number of padded zeros with the convention being to truncate them all. After all After all objecting to To me the solution is clear: more strictly follow IETF RFC 6943 and enforce that each octet be one-to-three digits in length. One could even follow a stricter format and require that there be no padded |
This is not an incorrect assumption, as octal numbers in the loose format are specified by the number having a leading zero. (See #83652 for the PR that banned octal numbers)
This is an implementation detail in the way that it checks for leading zeroes: it checks if the number starts with a zero and rejects it if the resultant parsed number is not zero. |
So we are re-defining how mathematics defines numeral systems? Saying that Also the documentation clearly states that each octet is in "decimal notation". Suddenly treating the octet |
Using leading zeros to designate octal literals is a fairly common convention. In other languages like JS and C, |
So it sounds like the documentation needs to be modified. It is confusing (because it is contradictory to reality) to say that each octet is in decimal notation when the actual implementation requires each octet to be either base-10 or base-8 where the latter is assumed when there are leading Seems much easier to both document and comprehend to simply require at most 3 decimal digits for each octet and either forbid leading |
"The actual implementation" is relevant only insofar as its behavior is externally observable. And -- correct me if I'm wrong -- in every case where the function succeeds, it (behaves as if it) interprets all digits in decimal, right? So I'd say it is correct to say that everything is "in decimal notation". The only weird bit is that |
When the documentation says the "four octets are in decimal notation", that is effectively a definition which in turn can be phrased as a logical biconditional. In this case, the logical biconditional I formed in my head was something like: "a As was discovered in the discussion here, it sounds like "decimal notation" as used in the documentation is not what it has meant my entire life nor what it means in a lot of areas in Rust. Specifically, any number that has leading |
That's true in C. (Specifically, it is how C converts series of digits to numbers. "Numbers" don't have leading 0s, numbers are elements of the set of integers. The representation of a number as a string of digits can have leading 0s. At least that's how I think the term is usually used when one needs to be precise about series of digits vs. the thing they represent.) No base-8 occurs anywhere in the behavior of this function. If you disagree, please show an input-output pair for this function which demonstrates that base 8 is used. (The code inside the function doesn't matter, of course. Only its observable behavior does.) I really don't understand why you keep insisting that this function has anything to do with base 8. I think it might be because of some quirk in how it is implemented, but unobservable implementation details are irrelevant. As a mathematician you should be even more used to this than most programmers, since the usual way of defining mathematical functions consists entirely of input-output pairs, there is not even such a thing as an "implementation"... but I am digressing. ;)
Even mathematically speaking, the sequence of digits "0" represents the same number in any base system. So, in a very precise formal sense, "not that it matters" is accurate. A base is just a way to convert a series of digits into a number, and for series of digits consisting only of 0s, the result is always the same. The definition of "decimal" used in the documentation exactly matches the mathematical definition. The documentation just fails to mention that there is an additional restriction, namely that no leading 0s are accepted (except if all digits of an octet are 0). However, it remains the case that if an input is accepted by this function, then all octets are interpreted as decimal -- and moreover, all decimal representations without leading 0s are accepted by this function. |
Saying a definition matches exactly with another definition before subsequently changing the definition by adding other restrictions does not make sense to me. Does the definition of an even integer "match exactly" with the definition of an integer? No. Adding or removing restrictions to a definition changes the definition. So the fact that leading |
Okay, I guess I should have said that it is strictly a restriction of the mathematical definition (a subset, if you take the graph view of a function). But it doesn't disagree on any inputs where it is defined. Your talking about base 8 makes it sound like sometimes it is not matching the base 10 function even when it does produce a result (but matches the base 8 function instead), and it took a quite a lot of time to realize that is not what happens. Moreover, the subset still contains the full graph of the inverse of the full "base 10" function where you always pick the shortest representation (i.e., the most common "serialization" function of numbers into base 10 representation). So unlike your "even" example it is a pretty reasonable parsing function still. There are only some "non-canonical" representations being rejected. |
Well, I think we are talking in circles at this point honestly, lol. While I would agree my example is a more extreme example of documentation failure since conflating the definition of "even integer" with "integer" is more absurd than conflating the definition of "base-10 in the shortest representation (with the exception of 0 which is allowed any non-negative integer amount of leading 0s)" with "base-10", both examples suffer from the fact that additional failures occur than otherwise would occur based on the exact definition of the term that is used (in my example "integer" and in this example "decimal"). For me, as I mentioned in the other discussion, this really is not me being "pedantic". I wrote code based on the documentation that ended being invalid since I (mis-)interpreted "decimal" to be more general than what it actually means in this context. Ideally, in my opinion at least, that should not happen especially since it is not that much additional effort to more properly document what certain terms mean even if that means just adding one or two more examples highlighting parsing failures (or successes) that one (like myself) would not expect. |
We already agree that the docs need to be improved. :) The rest of this discussion is then mostly philosophical and not going to lead to a productive outcome, so maybe we better stop it. ;) |
Reject octal zeros in IPv4 addresses This fixes rust-lang#86964 by rejecting octal zeros in IP addresses, such that `192.168.00.00000000` is rejected with a parse error, since having leading zeros in front of another zero indicates it is a zero written in octal notation, which is not allowed in the strict mode specified by RFC 6943 3.1.1. Octal rejection was implemented in rust-lang#83652, but due to the way it was implemented octal zeros were still allowed.
Reject octal zeros in IPv4 addresses This fixes rust-lang#86964 by rejecting octal zeros in IP addresses, such that `192.168.00.00000000` is rejected with a parse error, since having leading zeros in front of another zero indicates it is a zero written in octal notation, which is not allowed in the strict mode specified by RFC 6943 3.1.1. Octal rejection was implemented in rust-lang#83652, but due to the way it was implemented octal zeros were still allowed.
Reject octal zeros in IPv4 addresses This fixes rust-lang#86964 by rejecting octal zeros in IP addresses, such that `192.168.00.00000000` is rejected with a parse error, since having leading zeros in front of another zero indicates it is a zero written in octal notation, which is not allowed in the strict mode specified by RFC 6943 3.1.1. Octal rejection was implemented in rust-lang#83652, but due to the way it was implemented octal zeros were still allowed.
According to the
std::net::Ipv4Addr
documentation, "The four octets are in decimal notation, divided by . (this is called 'dot-decimal notation'). Notably, octal numbers and hexadecimal numbers are not allowed per IETF RFC 6943"; however the parsing of the&str
127.0000.0.1
violates the "strict" form mentioned in Section 3.1.1.—this form is what prohibits octal and hexadecimal numbers—meanwhile the&str
127.0.0.001
conforms to it. IETF RFC 6943 Section 3.1.1. states:If the af argument of inet_pton() is AF_INET, the src string shall
be in the standard IPv4 dotted-decimal form:
ddd.ddd.ddd.ddd
where "ddd" is a one to three digit decimal number between 0 and
255. The inet_pton() function does not accept other formats (such
as the octal numbers, hexadecimal numbers, and fewer than four
numbers that inet_addr() accepts).
As shown above, inet_pton() uses what we will refer to as the
"strict" form of an IPv4 address literal. Some platforms also use
the strict form with getaddrinfo() when the AI_NUMERICHOST flag is
passed to it.
I tried this code:
I expected to see this happen: both
assert!
s not causing a panic since the first&str
has an octet represented by more than three base-10 numbers despite the requirement per RFC 6943 stating each octet be a "one to three digit decimal number". The second&str
is valid per RFC 6943 since it is a string of exactly 4 octets separated by.
where each octet is a "one to three digit decimal number".Instead, this happened: Both lines cause a panic.
Meta
rustc --version --verbose
:rustc --version --verbose
:Backtrace
The text was updated successfully, but these errors were encountered: