Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update vcf file processing #124

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open

Conversation

jelman
Copy link

@jelman jelman commented Apr 30, 2021

I made some edits to get Predict.py running on UK Biobank data converted to VCF format. Note, I used plink2 to convert bgen to vcf with the modifer 'vcf-dosage=DS-force'. Otherwise, dosages were missing from genotyped variants if I used 'vcf-dosage=DS'.

@jelman jelman marked this pull request as ready for review April 30, 2021 16:42
@hakyim hakyim requested a review from Heroico June 15, 2021 14:53
@Heroico Heroico requested review from Fnyasimi and removed request for Heroico December 10, 2021 16:14
@@ -8,7 +8,7 @@

def vcf_file_geno_lines(path, mode="genotyped", variant_mapping=None, whitelist=None, skip_palindromic=False, liftover_conversion=None):
logging.log(9, "Processing vcf %s", path)
vcf_reader = VCF(path)
vcf_reader = VCF(path, gts012=True)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does gts012 represent?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From cyvcf2.VCF documentation:
gts012 (bool) – if True, then gt_types will be 0=HOM_REF, 1=HET, 2=HOM_ALT, 3=UNKNOWN. If False, 3, 2 are flipped.

yield (variant_id, chr, pos, ref, alt, f) + tuple(d)

elif mode == "imputed":
if len(alts) > 1:
logging.log("VCF imputed mode doesn't support multiple ALTs, skipping %s", variant_id)
if (len(ref)) | (len(alts[0])) > 1:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The genotype contains indels too?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, and they could be coded as either REF or ALT.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants