Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parser for Domain attributes #1939

Open
annevk opened this issue Feb 7, 2022 · 11 comments
Open

Parser for Domain attributes #1939

annevk opened this issue Feb 7, 2022 · 11 comments
Assignees

Comments

@annevk
Copy link

annevk commented Feb 7, 2022

If a resource on 127.0.0.1 sets a cookie with a Domain attribute set to 127.1, does that work? (Spoiler: it works if it was passed through the host parser, which seems like a prerequisite to determine whether it's an IP address or not as needed by Domain Matching.)

(I raised this in #1707 but it seems that's exclusively used for the non-ASCII bytes discussion.)

@sbingler
Copy link
Collaborator

sbingler commented Feb 7, 2022

Good question, what should happen?

Section 5.5 Step 9, sub-step 1 states

 9.   If the domain-attribute is non-empty:
        1.  If the canonicalized request-host does not domain-match the
            domain-attribute:
               1.  Ignore the cookie entirely and abort these steps.

It looks like "domain-match" is doing the heavy lifting in that alogithm.

Section 5.1.3 Domain Matching
has the following algorithm

A string domain-matches a given domain string if at least one of the
   following conditions hold:

   *  The domain string and the string are identical.  (Note that both
      the domain string and the string will have been canonicalized to
      lower case at this point.)

   *  All of the following conditions hold:

      -  The domain string is a suffix of the string.

      -  The last character of the string that is not included in the
         domain string is a %x2E (".") character.

      -  The string is a host name (i.e., not an IP address).

Because this is an IP address, only the top condition is applicable meaning that since a cookie with a Domain=127.1 wouldn't be identical to the string (the request-host) the implementor should ignore the cookie entirely.

That's straightforward enough, assuming we agree on my interpretation here.

does that work?

Also a good and (unfortunately) different question.

Well for Chrome 98 it seems we do set the cookie, oops.

This feels like it's an implementation bug, as the spec's instructions seem reasonable, but do you have a different take Anne?

@annevk
Copy link
Author

annevk commented Feb 7, 2022

I think using string equality with the Domain attribute value string (post dot removal and lowercasing) is fine. It's a bit stricter, doesn't perpetuate the weird spelling of IP addresses, and is less likely to run into the kind of problems people were concerned with around UTF-8 bytes.

I was concerned with how Domain Matching knows something is an IP address, but I suppose that "The string is a host name (i.e., not an IP address)." is not something it needs to know about the string, but about the domain string it presumably has previously parsed and thus knows about. That might be worth clarifying there to avoid giving the impression that the string needs to be parsed.

@sbingler
Copy link
Collaborator

sbingler commented Feb 7, 2022

Are you suggesting that
- The string is a host name (i.e., not an IP address).
should be modified to say
- The domain string is a host name (i.e., not an IP address).?

@annevk
Copy link
Author

annevk commented Feb 8, 2022

Maybe that's good enough. Ideally the spec would just reuse the URL Standard data types, but that's probably not attainable.

@sbingler
Copy link
Collaborator

sbingler commented Feb 8, 2022

Hmm, that change on its own might cause issues

Let's pretend we modified the algorithm to only look at the domain string

A string domain-matches a given domain string if at least one of the
   following conditions hold:

   *  The domain string and the string are identical.  (Note that both
      the domain string and the string will have been canonicalized to
      lower case at this point.)

   *  All of the following conditions hold:

      -  The domain string is a suffix of the string.

      -  The last character of the string that is not included in the
         domain string is a %x2E (".") character.

      -  The domain string is a host name (i.e., not an IP address).

If a response from https://127.0.0.1 sets a cookie foo=bar; Domain=.0.1 then I think we'd check the domain value 0.1, see it doesn't string match the string 127.0.0.1 and then try the next condition. I'm unclear if 0.1 could be considered a valid host name (rfc1123 section 2.1 implies it might be?), but if it is then we could pass this test and allow the cookie to be set.

Maybe we could get away with a warning that no parsing of the input string should occur aside from checking if it's an IP address. But then we've worked our way back to your original point of "I was concerned with how Domain Matching knows something is an IP address"

@annevk
Copy link
Author

annevk commented Feb 9, 2022

I was under the mistaken impression that string and not domain string was obtained from the cookie Domain attribute. So the algorithm works as-is for IPv4 (in the strict manner, as discussed), except that you need to know out-of-band whether string is an IP address. (That would be helped by using URL Standard types and making string a proper host, but I digress.)


Now having said that, I'm not sure it works as well for IPv6. There's multiple ways to spell an IPv6 address and if you don't canonicalize (by parsing) you essentially have to spell them the way the URL Standard will canonicalize them.

So you cannot write [0::1] in the Domain attribute and expect it to work. You'd have to write [::1]. That seems broken? (Again, this example will likely work in Chrome per your earlier comments.)

@martinthomson
Copy link
Contributor

I don't think that the definition permits IPv6 addresses, so that might be moot. Adding brackets to the grammar might cause a new kind of problem.

The domain-value is a subdomain as defined by [RFC1034], Section 3.5, and as enhanced by [RFC1123], Section 2.1.

(Curiously, I went to RFC 1035 Section 3.5 by accident, which talks about in-addr.arpa. It's not that.)

@annevk
Copy link
Author

annevk commented Feb 10, 2022

Good point, but those requirements have no bearing on user agents. The sections are quite explicit about that.

@martinthomson
Copy link
Contributor

Step 8 of the host parsing algorithm in the WhatWG spec has a very simple rule for distinguish IP addresses. It says "If asciiDomain ends in a number, then return the result of IPv4 parsing asciiDomain." We should either reference that or (given how simple it is) copy the "ends in a number" bit.

I believe that /\.(?:[0-9]+|0x[0-9a-f]+)$/ covers it for those who like to complicate their existing problems.

@enygren
Copy link
Contributor

enygren commented Jul 28, 2022

It does seem worth explicitly clarifying how IPv6 interacts here as well. (ie, is it ever allowed?)

If we explicitly support legacy IPv4 do we also need to support IPv6 somehow per https://www.iab.org/2016/11/07/iab-statement-on-ipv6/ ?

@martinthomson
Copy link
Contributor

Yeah, I think that maybe IPv6 is permitted, but I can't be sure. IPv6 would be identified by the leading '['

@sbingler sbingler added the defer label Sep 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

5 participants