Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added class to URLs in the response #322

Merged
merged 9 commits into from
Apr 24, 2019
25 changes: 21 additions & 4 deletions htsget.md
Original file line number Diff line number Diff line change
Expand Up @@ -162,6 +162,13 @@ The server SHOULD reply with an `UnsupportedFormat` error if the requested forma
[^a]
</td></tr>
<tr markdown="block"><td>
`class`
_optional string_
</td><td>
A list of URL classes to include, see below.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd add the briefest explanation here too such as "allowing client to request the data header only, or to opt-out of receiving the header in subsequent requests"

Default: all
</td></tr>
<tr markdown="block"><td>
`referenceName`
_optional_
</td><td>
Expand Down Expand Up @@ -280,6 +287,12 @@ _optional object_
</td><td>
For HTTPS URLs, the server may supply a JSON object containing one or more string key-value pairs which the client MUST supply as headers with any request to the URL. For example, if headers is `{"Range": "bytes=0-1023", "Authorization": "Bearer xxxx"}`, then the client must supply the headers `Range: bytes=0-1023` and `Authorization: Bearer xxxx` with the HTTPS request to the URL.
</td></tr>
<tr markdown="block"><td>
`class`
_string_
</td><td>
For file formats whose specification describes a header and a body, the class indicates which of the two will be retrieved when querying this URL. Either all or none of the URLs in the response must have a class attribute. The allowed values are `header` and `body`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For explicitness, I'd add something like "if class attributes are absent, client should assume data blocks include both header and body, possibly mixed"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is a good idea. For instance, the EVA server provides separate URLs for headers and body by default, to facilitate downloading them separately even when the class field isn't available.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But if the server doesn't annotate the URLs with class fields, there is no way for the client to know that the EVA server is doing that. The text @mlin suggests just makes explicit the reality that clients can't make assumptions when class fields are missing.

Copy link
Member Author

@cyenyxe cyenyxe Nov 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I misunderstood what @mlin means. Which of these 2 is it?

  1. Each URL could contain header or body
  2. Each URL could contain header and body

If it is the latter, then a client would have to split the contents from each URL in order to build a valid output file. If it is the former, then I will think about a rewording that is completely unambiguous.

Copy link
Member

@jmarshall jmarshall Nov 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If class fields are absent, the client can make no assumptions about the contents of individual ticket URLs or the boundaries between their contents: each URL could contain headers, body records, partial headers or records, or both.¹

(Consider e.g. a request with no referenceName/start/end thus a request for the entire file, and the server just returns a ticket that chops it up into 1 Mb chunks.)

It's implicit explicit (in the diagram of core mechanic section) that the client proceeds as if it's concatenating the contents of all the ticket URLs in order, to get the full file contents. It might choose to avoid redownloading an URL or two because it's already got it (effectively) cached, but that's its business. I don't quite see what you're getting at about the “client [having] to split the contents”…?


¹ Actually perhaps some implementations would like data records or compression blocks not to be split across ticket URLs, but at present I don't think the protocol says anything about that — so such splitting is allowed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation, I hadn't considered how they could be mixed due to the fix-sized chunks 😅 Now it's all clear.

I thought it meant that all the URLs could contain header and body, which would make individual blocks correct, but not the response as a whole.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see 😄 — yes: the client can't assume anything about the individual URL contents, but it is safe in expecting that the whole concatenated response is a valid header-body-body-…-body-EOFtrailer stream.

</td></tr>
</table>

</td></tr>
Expand All @@ -300,24 +313,28 @@ An example of a JSON response is:
"format" : "BAM",
"urls" : [
{
"url" : "data:application/vnd.ga4gh.bam;base64,QkFNAQ=="
"url" : "data:application/vnd.ga4gh.bam;base64,QkFNAQ==",
"class" : "header"
},
{
"url" : "https://htsget.blocksrv.example/sample1234/header"
"url" : "https://htsget.blocksrv.example/sample1234/header",
"class" : "header"
},
{
"url" : "https://htsget.blocksrv.example/sample1234/run1.bam",
"headers" : {
"Authorization" : "Bearer xxxx",
"Range" : "bytes=65536-1003750"
}
},
"class" : "body"
},
{
"url" : "https://htsget.blocksrv.example/sample1234/run1.bam",
"headers" : {
"Authorization" : "Bearer xxxx",
"Range" : "bytes=2744831-9375732"
}
},
"class" : "body"
}
]
}
Expand Down