-
Notifications
You must be signed in to change notification settings - Fork 118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Start writing the bundling spec. #98
Conversation
c464efb
to
43462ae
Compare
42a3c2f
to
2b6e2bb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You state that "parsing MUST fail" in a bunch of cases, but there's also a lot of scanning and jumping going on. Are there multiple levels where things can fail? That is, if you happened to just parse this once without jumping around, you can't simply return failure for all errors but would instead have to throw away items? That's rather unclear.
* Has exactly two keys starting with a ':' character, ':method' and ':url'. | ||
* *R*\[':method'] is an HTTP method defined as cacheable (Section 4.2.3 of | ||
{{!RFC7231}}) and safe (Section 4.2.1 of {{!RFC7231}}), as required for | ||
PUSH_PROMISEd requests (Section 8.2 of {{?RFC7540}}). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not really good enough for browsers I think. You need to have a safelist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://www.iana.org/assignments/http-methods/http-methods.xhtml is precise about the safe methods, but you have to click through to get cacheability defined, and the WebDAV methods leave it out. In an IETF draft, I'm pretty sure I can list the current methods that satisfy this (GET and HEAD), and in the browser-side loading living spec I can just list those expecting the list to get updated if any new ones get added.
I also note that Fetch uses "unsafe" without specifying it more precisely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the cache section, which is somewhat optional in a way.
* *R*\[':method'] is an HTTP method defined as cacheable (Section 4.2.3 of | ||
{{!RFC7231}}) and safe (Section 4.2.1 of {{!RFC7231}}), as required for | ||
PUSH_PROMISEd requests (Section 8.2 of {{?RFC7540}}). | ||
* *R*\[':url'] is an absolute URI (Section 4.3 of {{!RFC3986}}). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's unclear how to validate this in browsers. There's no such primitive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe one identifies an absolute-URL string by running the URL parser with no base URL and then checking that it doesn't have a fragment.
Are you complaining that I'm not linking to the URL spec, that this isn't phrased as a parsing algorithm, both, or something else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can I go with all three? E.g., http://example.com/%
is not an "absolute URI", but will parse per the URL parser.
A valid request item *R* is interpreted as an HTTP request by interpreting | ||
*R*\[':method'] as the request's method (Section 4 of {{!RFC7231}}), | ||
*R*\[':url'] as the request's effective request URI (Section 5.5 of | ||
{{!RFC7230}}), and the remaining key/value pairs as the request's header fields. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those key/value pairs are suitably constrained somehow? What's the parsing for them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They're HTTP headers, so we should do whatever https://fetch.spec.whatwg.org/#http-network-fetch does.
Or are you worried about how claimed request headers are matched against browser request headers? That's not written yet. I was hoping to use the matching rules for PUSH_PROMISEs, but those aren't written yet either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well they're not quite the same as ordinary HTTP headers, since they are delimited differently, right? E.g., can you have a 0x0A byte in the value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HTTP/2 headers are also delimited in a way that lets their values include a 0x0A byte. I wonder if browsers handle that correctly...
I suspect the right thing to do here is to explicitly create a header list from this, but since that concept doesn't exist in the IETF, I'll have to give up on the idea of making this an IETF draft.
I'm going to ping some folks to see if I'll hurt any feelings by doing that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oooh wow, I didn't know that about H/2. That's another reason to get a web-platform-tests setup.
(Note that I replied even in the feedback GitHub considers "outdated".) |
Are you going to leave it to Fetch to drop the response body for a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for skipping the "Are there multiple levels where things can fail?" question earlier. My plan is that there are 2 levels where things can fail:
- The browser parses the metadata (https://jyasskin.github.io/webpackage/bundles/draft-yasskin-dispatch-bundled-exchanges.html#load-metadata), and if anything fails there, the whole bundle is invalid.
- Then for any response the browser actually wants to use, it parses that particular response (https://jyasskin.github.io/webpackage/bundles/draft-yasskin-dispatch-bundled-exchanges.html#load-response). If that fails, only that particular response is invalid, but the bundle can still contain other responses.
Thanks for saying that's not adequately clear; I'll try to fix it.
I hadn't thought about how exactly to constrain HEAD. If you think the right approach is to treat it as an automatic transformation from GET, I'll do that.
None of this is done yet. I'll comment again when it is.
A valid request item *R* is interpreted as an HTTP request by interpreting | ||
*R*\[':method'] as the request's method (Section 4 of {{!RFC7231}}), | ||
*R*\[':url'] as the request's effective request URI (Section 5.5 of | ||
{{!RFC7230}}), and the remaining key/value pairs as the request's header fields. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HTTP/2 headers are also delimited in a way that lets their values include a 0x0A byte. I wonder if browsers handle that correctly...
I suspect the right thing to do here is to explicitly create a header list from this, but since that concept doesn't exist in the IETF, I'll have to give up on the idea of making this an IETF draft.
I'm going to ping some folks to see if I'll hurt any feelings by doing that.
(Another thought I had was to take inspiration from the Cache API, as sort of the API surface on top of this format. The Cache API only supports |
3a78b8b
to
f281570
Compare
60842c5
to
ee384a4
Compare
I believe this is finally ready for a re-review. It's missing 3 things that are going to be important, but that I'd like to do in a subsequent patch:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just skimmed through, had some nit comments / questions (non blocking).
|
||
## Stream attributes and operations {#stream-operations} | ||
|
||
* A sequence of **available** bytes. As the stream delivers bytes, these are |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: should bolding be applied to available bytes ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure.
the 4-item array initial byte and 8-byte bytestring initial byte, followed by | ||
🌐📦 in UTF-8), return an error. | ||
|
||
1. Let `sectionOffsetsLength be the result of getting the length of the CBOR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: sectionOffsetsLength
(missing the closing `)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
+ offset`. That is, offsets in the index are relative to the start of the | ||
"responses" section. | ||
1. If `offset + length` is greater than | ||
`section-offsets\["responses"].length`, return an error. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: extra \
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
|
||
1. Set `metadata`'s "manifest" item to `url`. | ||
|
||
### Parsing the critical section {#critical-section} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wasn't fully unsure what could come here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we add more sections in the future, and the publisher wants to make sure parsers don't just ignore one of those sections because they don't understand it, this section points out the ones not to skip.
I'm not certain this is the right way to do this. PNG does it with a particular capitalization in the section names: https://tools.ietf.org/html/rfc2083#page-13
struct to fill in, the parser MUST do the following: | ||
|
||
1. Let `critical` be the result of parsing `sectionContents` as a CBOR item | ||
matching the above `critical` rule ({{parse-cbor}}. If `critical` is an |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: missing ) after {{parse-cbor}}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
TODO: Add the rest of the details of creating a `ReadableStream` object. | ||
|
||
1. Let `response` be a new response ({{FETCH}}) whose: | ||
* Url list is `request`'s url list, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Parsing the index section" seems to say request has one url, do you mean it could have a list / redirect chain?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just an asymmetry in how requests and responses default their values: https://fetch.spec.whatwg.org/#concept-request-url-list defaults from the request's url, while https://fetch.spec.whatwg.org/#concept-response-url is hard-coded to point to the response's url list. I believe both URL lists should always have 1 value for exchanges in the bundle.
I suspect Fetch will actually wind up copying or extending this response when serving browser-generated requests from bundles, so that a redirect into a bundle can work correctly.
and {{semantics-load-response}} each of which can return an error instead of | ||
their normal result. | ||
|
||
## Stream attributes and operations {#stream-operations} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@domenic, here's the use of streams I mentioned on #whatwg. I'm thinking about re-doing it in terms of ReadableStream
despite your advice because https://jyasskin.github.io/webpackage/bundles/draft-yasskin-dispatch-bundled-exchanges.html#load-response needs to create a body's stream, which is defined as a ReadableStream
. The stream a bundle's parsed from is likely to be a response body anyway, so it'll probably be a ReadableStream even if I pretend it isn't.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still find it a little hard to digest how the whole thing fits together, but I tried to review regardless.
|
||
--- abstract | ||
|
||
Bundled exchanges provide a way to bundle up groups of HTTP request+response |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
request-response?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was trying to emphasize that it's both. A hyphen doesn't feel quite right because "request" doesn't modify "response". A slash feels like it might be alternation, although https://infra.spec.whatwg.org/#pair uses it the way I intend here.
Bundled exchanges provide a way to bundle up groups of HTTP request+response | ||
pairs to transmit or store them together. The component exchanges can be signed | ||
using {{?I-D.yasskin-http-origin-signed-responses}} to establish their | ||
authenticity. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd leave this out, since the signing is controversial and not required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't intend to shy away from the controversial bits, but you're right that it's not required, so maybe it doesn't belong in the abstract.
I do expect to add a section to bundles to efficiently encode multi-resource signatures with the same semantics as multiple signed-exchanges, at which point it'll come back to the abstract.
|
||
Discussion of this draft takes place on the ART area mailing list | ||
(art@ietf.org), which is archived | ||
at <https://mailarchive.ietf.org/arch/search/?email_list=art>. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems discussion also takes place on GitHub?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's covered by the "issues list" comment in the next paragraph. HTTPWG drafts have similar text, and also discuss on GitHub issues.
## Terminology | ||
|
||
Exchange (noun) | ||
: An HTTP request/response pair. This can either be a request from a client and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd pick one separator and stick with it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, thanks.
error or because the stream has a finite length. | ||
* A **seek** operation to change the current offset, relative to either the | ||
beginning of the available bytes or to the old current offset. A seek past the | ||
end of the available bytes is an *error in this specification*. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"error in this specification" doesn't appear to mean anything. Wouldn't it be better to say that seek or read can return an error and that callers of them need to account for that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I intended the same meaning as in https://infra.spec.whatwg.org/#assertions, but looking at the uses of "wait", "seek", and "read", every seek and read except the ones to the start of the stream was immediately preceded by a wait, so I may as well incorporate the wait into the other operations.
|
||
requests | ||
|
||
: A map ({{INFRA}}) whose keys are the HTTP requests ({{FETCH}}) for the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What Fetch defines as "requests" are not necessarily HTTP requests. They're more broad.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the best way to say that the keys here come from the subset of Fetch "requests" that are HTTP requests? I don't think it makes sense to include the other schemes mentioned in https://fetch.spec.whatwg.org/#url.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even those are more broad as they carry fields HTTP requests don't have. I guess you could say requests whose url list contains a single URL whose scheme is an HTTP(S) scheme (unless this is restricted to secure contexts?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've removed the attempt to subset the kinds of Fetch requests this operation can return. If it would noticeably simplify the request-matching function I need to write, I might add the subsetting back later, but I don't really expect that.
{{semantics-load-metadata}}, while a client will generally want to load the | ||
response for a request that the client generated. This specification does not | ||
define how a client determines the best available bundled response, if any, for | ||
that client-generated request. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not? We define this for the Cache API, I think ideally we use the same algorithm here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clearly we're going to have to define that algorithm somewhere. I don't have a strong opinion between here and the WICG/Fetch side, but I was leaning toward WICG/Fetch, and would definitely prefer to do it in a separate PR even if you prefer this spec.
Unfortunately, https://w3c.github.io/ServiceWorker/#query-cache-algorithm assumes that varied headers will be identical between the cached request and the queried request, which isn't plausible for a bundle intended to be used by more than one UA.
For example, imagine that Greta's UA sends Accept-Language: de-DE,de,en;q=0.9
. In the Cache API, that's exactly the request header that'll be cached since the Cache API uses https://fetch.spec.whatwg.org/#concept-fetch to fill it in. Similarly, if Hilda's UA sends Accept-Language: de-AT,de;q=0.9
, that's what her cache will contain. If the server wants to put a de
resource in a bundle to serve both users, what Accept-Language
header should they write? There isn't one that Query-Cache will find for both people. The same problem shows up for Accept
across browsers instead of people. So we need a new algorithm.
|
||
The bundle is roughly a CBOR item ({{?I-D.ietf-cbor-7049bis}}) with the | ||
following CDDL ({{?I-D.ietf-cbor-cddl}}) schema, but bundle parsers are required | ||
to successfully parse some byte strings that aren't valid CBOR. For example, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there still use in CBOR then? Doesn't the MIME type lie with +cbor if you cannot actually use a CBOR parser?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm primarily using CBOR instead of a totally custom binary format because 1) ASCII/UTF-8 formats like JSON are nice because you can read them without custom parsers, and 2) we need a binary format for this, but there are likely to be generic tools to dump CBOR to a readable format, so that preserves as much of the ASCII benefit as possible.
I would actually like to say that bundles must be well-formed CBOR items, but because most parsers won't actually work that way, I expect non-CBOR bundles to wind up existing in the wild. Do you think I should say that clients MAY reject bundles that aren't a valid CBOR item? Or something else?
`knownSections` says not to process other sections, add those sections' names | ||
to `ignoredSections`. | ||
|
||
1. Let `metadata` be a struct ({{INFRA}}) with no items. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea with structs is that they are immutable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you mean that the set of item names is immutable. It's simple to change this to a map, so I've done that. The set of keys is only not-fixed because I wanted to let other specifications add section names that might define new metadata, but I'm not totally sure that's a good idea, so this might go back to a struct later.
1. Continue. | ||
1. If `name` or `value` doesn't satisfy the requirements for a header in | ||
{{FETCH}}, return an error. | ||
1. If `headers` contains ({{FETCH}}) `name`, return an error. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You cannot have multiple headers of the same name? Might be worth calling out you cannot encode cookies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe multiple Cookie
headers would be encoded by joining them with semicolons (not that you'd generally put cookies in bundled requests), but yeah, Set-Cookie
has no single-field form. Noted.
(This line is actually redundant with the fact that CBOR maps exclude duplicate keys, so I've replaced it with an Assert.)
a9f8f25
to
03a5f78
Compare
Re: "best available bundled response" I'd prefer it as an open issue if you plan on addressing it rather than it saying it's implementation-defined. (Note that we might get rid of pairs still: whatwg/infra#127.) |
And tweak a couple definitions to let us parse from a stream without implementing a streaming CBOR parser.
thing fits together.
Since clients parse by following byte offsets, they won't enforce that the format is CBOR, which means some instances of it probably won't be. Putting +cbor in the MIME type would be misleading.
Thanks! |
Load Metadata will only return a subset of all possible requests, but it doesn't seem important to specify that subset precisely here.
I'm going to merge this so it's easier to refer to and so I can start sending PRs to modify it, but please feel free to keep commenting and filing issues against it. |
Preview
This fills a similar niche as MHTML, but
I believe that combined with signed exchanges, bundles fill all of the web packaging use cases except that they'll need a subsequent change to work for "Third-party security review", which will need a signature to cover groups of resources to enforce that their versions are consistent.