Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improper error response if unsupported format is requested #203

Closed
jrobinso opened this issue Sep 29, 2023 · 7 comments
Closed

Improper error response if unsupported format is requested #203

jrobinso opened this issue Sep 29, 2023 · 7 comments
Labels
spec Specification changes needed

Comments

@jrobinso
Copy link

The following request

https://htsget.demo.umccr.org/reads/org.umccr.demo.htsget-rs-data/cram/mt.sorted_cram?class=header&format=bam

results in the following response

{
  "htsget": {
    "error": "NotFound",
    "message": "aws error: unhandled error, with key: `cram/mt.sorted_cram.bam`"
  }
}

According to the spec

The server SHOULD reply with an UnsupportedFormat error if the requested format is not supported.

I realize it says SHOULD so this behavior isn't technically out of spec, but it can result in confusing error messages to client programs and their users. It doesn't appear that htsjdk supports decoding CRAM from htsget responses, so I was testing to see what happens if BAM is requested. I think some servers could handle either BAM or CRAM output for the same dataset ID, but other cannot (such as this particular test server). So its difficult to present an informative error message to end users without the specific error called for in the spec.

@jrobinso
Copy link
Author

Just a comment since I know some of you participate in spec meetings, there doesn't seem to be any way to ask a server what output formats it supports for a given ID. I realize there are only 2 possible for reads, but that is more than 1 and clients should be able to know before requesting a particular format.

@brainstorm
Copy link
Member

brainstorm commented Oct 2, 2023

I think some servers could handle either BAM or CRAM output for the same dataset ID, but other cannot (such as this particular test server).

I think that this issue stems from the following S3 path confusion, so I just went and created a mixed S3 subkey where you have BAM and CRAM together with the same ID:

(base) rvalls@m1 ~ % aws s3 ls s3://org.umccr.demo.htsget-rs-data/
                           PRE bam/
                           PRE bcf/
                           PRE cram/
                           PRE crypt4gh/
                           PRE events/
                           PRE vcf/

(base) rvalls@m1 ~ % aws s3 ls s3://org.umccr.demo.htsget-rs-data/cram/
2023-09-25 16:26:51    1627794 htsnexus_test_NA12878.cram
2023-09-25 16:26:51        129 htsnexus_test_NA12878.cram.crai
2023-09-28 15:32:18     693771 mt.sorted_cram.cram
2023-09-28 15:32:26        320 mt.sorted_cram.cram.crai

(base) rvalls@m1 ~ % aws s3 ls s3://org.umccr.demo.htsget-rs-data/bam/
2023-09-25 16:26:51    2596799 htsnexus_test_NA12878.bam
2023-09-25 16:26:51       6728 htsnexus_test_NA12878.bam.bai
2023-09-25 16:26:51       2034 htsnexus_test_NA12878.bam.blocks.yaml
2023-09-25 16:26:51       2616 htsnexus_test_NA12878.bam.gzi
2023-09-28 15:02:23     876871 mt.sorted.bam
2023-09-28 15:02:29     747632 mt.sorted.bam.bai

(base) rvalls@m1 ~ % aws s3 cp s3://org.umccr.demo.htsget-rs-data/bam/mt.sorted.bam s3://org.umccr.demo.htsget-rs-data/mixed/mt.sorted.bam
copy: s3://org.umccr.demo.htsget-rs-data/bam/mt.sorted.bam to s3://org.umccr.demo.htsget-rs-data/mixed/mt.sorted.bam
(base) rvalls@m1 ~ % aws s3 cp s3://org.umccr.demo.htsget-rs-data/bam/mt.sorted.bam.bai s3://org.umccr.demo.htsget-rs-data/mixed/mt.sorted.bam.bai
copy: s3://org.umccr.demo.htsget-rs-data/bam/mt.sorted.bam.bai to s3://org.umccr.demo.htsget-rs-data/mixed/mt.sorted.bam.bai
(base) rvalls@m1 ~ % aws s3 cp s3://org.umccr.demo.htsget-rs-data/cram/mt.sorted_cram.cram s3://org.umccr.demo.htsget-rs-data/mixed/mt.sorted.cram
copy: s3://org.umccr.demo.htsget-rs-data/cram/mt.sorted_cram.cram to s3://org.umccr.demo.htsget-rs-data/mixed/mt.sorted.cram
(base) rvalls@m1 ~ % aws s3 cp s3://org.umccr.demo.htsget-rs-data/cram/mt.sorted_cram.cram.crai s3://org.umccr.demo.htsget-rs-data/mixed/mt.sorted.cram.crai
copy: s3://org.umccr.demo.htsget-rs-data/cram/mt.sorted_cram.cram.crai to s3://org.umccr.demo.htsget-rs-data/mixed/mt.sorted.cram.crai

(base) rvalls@m1 ~ % aws s3 ls s3://org.umccr.demo.htsget-rs-data/mixed/
2023-10-02 14:01:15     876871 mt.sorted.bam
2023-10-02 14:01:28     747632 mt.sorted.bam.bai
2023-10-02 14:02:25     693771 mt.sorted.cram
2023-10-02 14:02:36        320 mt.sorted.cram.crai

You'll see that our server behaves as expected in this situation:

curl 'https://htsget.demo.umccr.org/reads/org.umccr.demo.htsget-rs-data/mixed/mt.sorted?class=header' <--- Gets the BAM format by default, as per spec
curl 'https://htsget.demo.umccr.org/reads/org.umccr.demo.htsget-rs-data/mixed/mt.sorted?class=header&format=CRAM'

And to your point:

there doesn't seem to be any way to ask a server what output formats it supports for a given ID

From my point of view, the UnsupportedFormat is reserved to formats that the (htsget) server does not support, as it only supports the subset {VCF,BCF,BAM,CRAM} at the time of this writing , i.e, BED is not supported at the moment:

% curl 'https://htsget.demo.umccr.org/reads/org.umccr.demo.htsget-rs-data/mixed/mt.sorted?class=header&format=BED'
{
  "htsget": {
    "error": "UnsupportedFormat",
    "message": "bed isn't a supported format for this endpoint"
  }
}

If I understand correctly, you'd want a hint for clients when an ID with multiple formats is found within the storage backend? As in, some draft behavior like the following?:

curl 'https://htsget.demo.umccr.org/reads/org.umccr.demo.htsget-rs-data/mixed/mt.sorted?class=header

{
  "htsget": {
    "message": "Multiple formats detected for the supplied ID, returning BAM by default",
    "formats": ["BAM, CRAM"]
  }
}

And then have a HTTP 302 or 303 code to redirect the client to BAM automatically as it does now by default? That way clients would have that hint and not break backwards compat?

Happy to propose it on hts spec if that's what you need/want, @jrobinso?

@brainstorm brainstorm added the spec Specification changes needed label Oct 2, 2023
@mmalenic
Copy link
Member

mmalenic commented Oct 2, 2023

I agree with @brainstorm here. I'm not sure that UnsupportedFormat is the correct error response because my interpretation of "The server SHOULD reply with an UnsupportedFormat error if the requested format is not supported" was for the server as a whole, rather than for a specific id. Although it could be interpreted both ways, I think NotFound probably makes more literal sense because that key, e.g. cram/mt.sorted_cram.bam, could not be found.

I also think this interpretation (the currently implemented one) aligns more closely with the HTTP status codes that UnsupportedFormat and NotFound map to. It's not that a request such as the example is malformed (indicating a 400 response), but rather that the underlying resource, in this case cram/mt.sorted_cram.bam does not exist.

Either way, I can see the utility of having a hint for the formats that are available for a particular id. This doesn't necessarily need a spec change either, and it could just be present in the error response (whether that is UnsupportedFormat or NotFound), e.g.:

{
  "htsget": {
      "error": "NotFound",
      "message": "id with key: `cram/mt.sorted_cram.bam` not found. The following formats are available for this id: ['BAM']"
  }
}

@mmalenic
Copy link
Member

mmalenic commented Oct 2, 2023

Somewhat related to this (although not directly) is #127.

@brainstorm
Copy link
Member

brainstorm commented Oct 2, 2023

This doesn't necessarily need a spec change either, and it could just be present in the error response (whether that is UnsupportedFormat or NotFound)

Well, I slightly disagree here: I'm not sure that message response is the best place to embed operational hints that clients might use and rely upon... i.e: you'd have to keep that message stable for new upcoming releases and it sounds like a future breakage site :-S

@mmalenic
Copy link
Member

mmalenic commented Oct 2, 2023

Yeah that's a completely valid point. A hint would probably be best as an additional JSON field, which would require a spec change.

@jrobinso
Copy link
Author

jrobinso commented Oct 2, 2023

OK I will defer to you guys, I misinterpreted the spec. For my part I've updated the igv.js doc to be clear that CRAM and BCF are not supported for htsget. This is also true of IGV desktop as the htsjdk does not support them.

@jrobinso jrobinso closed this as completed Oct 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
spec Specification changes needed
Projects
None yet
Development

No branches or pull requests

3 participants