Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

split-vep could not recognize M-CAP #1686

Closed
zofieLin opened this issue Mar 23, 2022 · 6 comments
Closed

split-vep could not recognize M-CAP #1686

zofieLin opened this issue Mar 23, 2022 · 6 comments

Comments

@zofieLin
Copy link

Hi,

I am using bcftools +split-vep to extract the M-CAP score from the vcf annotated by VEP, but it could not recognize and produced error below:
image

I checked with bcftools +split-vep *.vep.vcf.gz -l , "M-CAP_score" was actually present in the CSQ field.
image

I also tired with some other typos, but all didn't work. Are there any solutions to solve this problem?
image

Thanks!

@pd3
Copy link
Member

pd3 commented Mar 25, 2022

This is caused by the dash which is not considered a valid tag name character by VCF specification. This is a problem because +split-vep treats internally VEP fields as VCF tags and this breaks the query format expression.

A quick workaround is to edit the header and replace with underscores, e.g. M_CAP_pred. See the reheader command.

The program could be also made smarter and replace such characters automatically. Could you please show the VEP line from the header and a single data line for debugging?

@zofieLin
Copy link
Author

Thanks for the reply.

I tried to reheader and replace it as MCAP and that works well, just need more steps to edit the header and then reheader.

Attched with some lines, see if it could help you fix this problem, M-CAP is in the last second and last third fields.
Thanks.
test.txt

@pd3 pd3 closed this as completed in 22f0d90 Mar 28, 2022
@pd3
Copy link
Member

pd3 commented Mar 28, 2022

This is now addressed, offending characters are replaced with underscores. Please try it out

@zofieLin
Copy link
Author

Thank you so much, it worked well, with "%M_CAP_pred".

daviesrob pushed a commit to daviesrob/bcftools that referenced this issue Apr 4, 2022
Invalid characters are replaced with underscores, uniqueness of newly
formed tag names is currently not checked for (a possible future todo).

Fixes samtools#1686
@hanguojun007
Copy link

Hi, I try reheader, but it just replace the header, not the INFO content.
when I query the new header, i can't get the value. because the header of INFO does not change.

@pd3
Copy link
Member

pd3 commented Apr 19, 2022

@hanguojun007 This would have to be done before the plugin is run.

For example the VEP definition

##INFO=<ID=CSQ,...,Description="Format: Allele|Consequence|..|M-CAP_pred|...">

would be changed to

##INFO=<ID=CSQ,...,Description="Format: Allele|Consequence|..|M_CAP_pred|...">

with reheader, then the plugin +split-vep would work.

With the latest version this is not needed anymore.

pd3 added a commit that referenced this issue May 13, 2022
Invalid characters are replaced with underscores, uniqueness of newly
formed tag names is currently not checked for (a possible future todo).

Fixes #1686
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants