Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ensembl-vep refseq/merged option broken #1559

Closed
YeHW opened this issue Jun 11, 2024 · 1 comment
Closed

ensembl-vep refseq/merged option broken #1559

YeHW opened this issue Jun 11, 2024 · 1 comment
Assignees
Labels
bug Something isn't working

Comments

@YeHW
Copy link
Contributor

YeHW commented Jun 11, 2024

Description of the bug

The code parsing vep related options can't deal with the --refseq/--merged options in the vep command.

If setting:

vep_cache_version: "111"
vep_genome: "GRCh37"
vep_cache: "path/to/vep-cache/"
vep_custom_args: "--refseq --offline --everything"
vep_species: "homo_sapiens_refseq"

then the error is:

  -------------------- EXCEPTION --------------------
  MSG: Should not use homo_sapiens_refseq as --species.
  Try using flags --refseq or --merged with --species homo_sapiens

the command.sh:

vep \
      -i sample.freebayes.vcf.gz \
      -o sample.freebayes_VEP.ann.vcf.gz \
      --stats_file sample.freebayes_VEP.ann.summary.html \
      --vcf --refseq --offline --hgvsg --allele_number --everything --format vcf \
      --compress_output bgzip \
      --fasta human_g1k_v37_decoy.fasta \
      --assembly GRCh37 \
      --species homo_sapiens_refseq \
      --cache \
      --cache_version 111 \
      --dir_cache ${PWD}/ensembl-vep \
      --fork 6

Notice that --species homo_sapiens_refseq is incorrect.
The correct way is to use --species homo_sapiens along with --refseq (latter is present because we include it in vep_custom_args.

The --dir_cache ${PWD}/ensembl-vep (corresponds to vep_cache) is organized as:

${PWD}/ensembl-vep
├── homo_sapiens_refseq
│   ├── 111_GRCh37
│   └── 111_GRCh38
...

which is the default structure if downloaded via ensembl-vep (these files are downloaded via docker myself). For homo_sapiens, there are two more versions, the default homo_sapien and the other one homo_sapiens_merged. If download them all, then the folder would look like:

${PWD}/ensembl-vep
├── homo_sapiens
│   ├── 111_GRCh37
│   └── 111_GRCh38
├── homo_sapiens_merged
│   ├── 111_GRCh37
│   └── 111_GRCh38
├── homo_sapiens_refseq
│   ├── 111_GRCh37
│   └── 111_GRCh38

VEP's annotation folder selection logic:

  • By default vep will use ${dir_cache}/${species}/${cache_version}_${assembly}
  • If --refseq is provided, it will use ${dir_cache}/${species}_refseq/${cache_version}_${assembly}
  • If --merged is provided, it will use ${dir_cache}/${species}_merged/${cache_version}_${assembly}

Below are mappings between vep command options and sarek input params:

  • $dir_cache: $vep_cache
  • $species: $vep_species
  • $cache_version: $vep_cache_version
  • $assembly: $vep_genome

However, I'm not sure if this is the case for other species other than homo_sapiens.
For homo_sapiens, the correct logic to parse might be:

  • check if vep_custom_args contains --refseq/--merged
  • if both not exist, then validate annotation folder existence: ${vep_cache}/${vep_species}/${vep_cache_version}_${vep_genome}
  • if any of them exists, should validate annotation folder existence: ${vep_cache}/${vep_species}_refseq/${vep_cache_version}_${vep_genome} or ${vep_cache}/${vep_species}_merged/${vep_cache_version}_${vep_genome} ; note that $vep_species should still be homo_sapiens (because we need --species homo_sapiens in the vep command)

A more consistent way might be to provide another sarek input param to deal with vep's --refseq/--merged option

Command used and terminal output

No response

Relevant files

No response

System information

No response

@YeHW YeHW added the bug Something isn't working label Jun 11, 2024
@YeHW YeHW self-assigned this Jun 12, 2024
@YeHW
Copy link
Contributor Author

YeHW commented Jun 12, 2024

This is fixed by #1563

@YeHW YeHW closed this as completed Jun 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant