Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simple curl examples against **local** NA12878 BAMs? #15

Closed
brainstorm opened this issue Oct 12, 2020 · 5 comments
Closed

Simple curl examples against **local** NA12878 BAMs? #15

brainstorm opened this issue Oct 12, 2020 · 5 comments

Comments

@brainstorm
Copy link

brainstorm commented Oct 12, 2020

Hello @jb-adams, great refresh of this refserver impl, looking good! I've tried the integration tests that point towards tabulamuris against the CZI S3 bucket and they work great.

Now, when testing it out by pointing the config to the local BAM GiaB files like so:

{
  "htsgetconfig": {
    "props": {
      "port": "3000",
      "host": "http://localhost:3000/"
    },
    "reads": {
      "enabled": true,
      "dataSourceRegistry": {
        "sources": [
          {
            "pattern": "NA12878",
            "path": "../../htsget-refserver/data/gcp/gatk-test-data/wgs_bam/NA12878.bam"
          }
        ]
      }
    },
    "variants": {
      "enabled": true
    }
  }
}

Could you outline some simple curl queries in the README, i.e (but working):

% curl -s "http://localhost:3000/reads/NA12878" | jq .
{
  "htsget": {
    "error": "NotFound",
    "message": "The requested resource could not be associated with a registered data source"
  }
}

I'm not sure it's a good moment to ask those questions since there seem to be some tests failing though, perhaps this server is still under active development?:

% go test ./...
?   	github.com/ga4gh/htsget-refserver/cmd	[no test files]
ok  	github.com/ga4gh/htsget-refserver/internal/htsconfig	(cached)
ok  	github.com/ga4gh/htsget-refserver/internal/htsconstants	(cached)
?   	github.com/ga4gh/htsget-refserver/internal/htsdao	[no test files]
ok  	github.com/ga4gh/htsget-refserver/internal/htserror	(cached)
ok  	github.com/ga4gh/htsget-refserver/internal/htsformats	(cached)
ok  	github.com/ga4gh/htsget-refserver/internal/htsrequest	(cached)
--- FAIL: TestHTTPRequestMulti (0.09s)
    requestmulti_test.go:180:
        	Error Trace:	requestmulti_test.go:180
        	Error:      	Not equal:
        	            	expected: "8c0bb4317c810247c65cbe8eacdf7d2a"
        	            	actual  : "fd2f50523ce54f846b11c0269619c52d"

        	            	Diff:
        	            	--- Expected
        	            	+++ Actual
        	            	@@ -1 +1 @@
        	            	-8c0bb4317c810247c65cbe8eacdf7d2a
        	            	+fd2f50523ce54f846b11c0269619c52d
        	Test:       	TestHTTPRequestMulti
FAIL
FAIL	github.com/ga4gh/htsget-refserver/internal/htsserver	5.461s
ok  	github.com/ga4gh/htsget-refserver/internal/htsticket	(cached)
ok  	github.com/ga4gh/htsget-refserver/internal/htsutils	(cached)
FAIL

Also, docs referenced from the GA4GH production config are 404'ing.

Most surely it's just me not passing the correct/mandatory parameters or doing something wrong, so please let me know which sample curl queries I can fire up given the above .json config file. Thanks in advance!

/cc @victorskl @ohofmann @reisingerf

@jb-adams
Copy link
Member

hi @brainstorm thanks for taking a look into this! I will try to reproduce the error, but the config seems correct from an initial look. The server is still under active development, that link should be removed as we plan to build a better documentation page. The index.html page was just a stub.

@jb-adams
Copy link
Member

@brainstorm the dataSourceRegistry module is based on matching ids with regex patterns. Right now, the match will fail if there's no capture groups (ie. the system expects there to be capture groups to align a passed id with one file in a directory of similarly named files, for example). The API fails this ID because there's no capture group.

This should be corrected to allow for single, hardcoded files. For now, you should be able to fix it by doing this:

...
"dataSourceRegistry": {
  "sources": [
    {
      "pattern": "^(?P<accession>NA12878)$",
      "path": "../../htsget-refserver/data/gcp/gatk-test-data/wgs_bam/{accession}.bam"
    }
  ]
}
...

ie the named capture group of accession will only match NA12878, which will then be injected into the file path.

@brainstorm
Copy link
Author

brainstorm commented Oct 27, 2020

Following this up from igvteam/igv#850 to here, since it doesn't belong in the IGV-desktop PR....

Maps a single ID (NA12878) to a single, local file (located at ./data/gcp/gatk-test-data/wgs_bam/NA12878.bam). When you run the server, do you have this file available locally?

Yes I believe so:

% md5sum ./data/gcp/gatk-test-data/wgs_bam/NA12878.bam
bc8e0e64772c9039bb3f9d00c0b8fc4e  ./data/gcp/gatk-test-data/wgs_bam/NA12878.bam

Given the config, the following IDs won't work: giab.NA12878.NIST7035.1, giab.NA12878.NIST7035.2 because they don't conform to the above regex pattern. To pull in these files to the htsget server, you would need to add more data sources. Check out this file, which is how I configure the server when testing locally.

I just pulled that config-local.json file locally and I think that the problem is with the files that sit locally, see:

$ ./htsget-refserver -config data/config/config-local.json
Server started on port 3000!
$ curl -s http://localhost:3000/reads/NA12878 | jq .
{
  "htsget": {
    "error": "NotFound",
    "message": "The requested resource could not be associated with a registered data source"
  }
}

$ curl -s http://localhost:3000/reads/giab.NA12878.NIST7035.1 | jq .
{
  "htsget": {
    "format": "BAM",
    "urls": [
      {
        "url": "https://giab.s3.amazonaws.com/data/NA12878/Garvan_NA12878_HG001_HiSeq_Exome/project.NIST_NIST7035_H7AP8ADXX_TAAGGCGA_1_NA12878.bwa.markDuplicates.bam",
        "headers": {
          "Range": "bytes=0-499999999"
        }
      },
(... continues with subsequent urls and ranges...)

@jb-adams
Copy link
Member

Yes, with config-local.json, the giab.NA12878.NIST{accession}.{lane} ids will work.

If you are using config-local.json you will need to prepend gatk. to the beginning of the ID to access that local file at ./data/gcp/gatk-test-data/wgs_bam/NA12878.bam. This is the matching data source in the config:

{
  "pattern": "^gatk\\.(?P<accession>.*)$",
  "path": "./data/gcp/gatk-test-data/wgs_bam/{accession}.bam"
}

Try the ID, gatk.NA12878 and/or gatk.NA12878_20k_b37.

You can also hit this same server running on AWS at https://htsget.ga4gh.org, ie. https://htsget.ga4gh.org/reads/gatk.NA12878

@brainstorm
Copy link
Author

Thanks Jeremy!

$ curl -s http://localhost:3000/reads/gatk.NA12878 | jq .
{
  "htsget": {
    "format": "BAM",
    "urls": [
      {
        "url": "http://localhost:3000/file-bytes",
        "headers": {
          "HtsgetFilePath": "./data/gcp/gatk-test-data/wgs_bam/NA12878.bam",
          "Range": "bytes=0-15236349"
        }
      }
    ]
  }
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants