You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Request for Assistance with Replicating INDEL Imputation Accuracy Trends Using GLIMPSE2 from "Imputation of low-coverage sequencing data from 150,119 UK Biobank genomes"(DOI: 10.1038/s41588-023-01438-3)
#249
Open
hardworking555 opened this issue
Dec 23, 2024
· 0 comments
Your May 2023 publication, "Imputation of low-coverage sequencing data from 150,119 UK Biobank genomes" (DOI: 10.1038/s41588-023-01438-3), has been highly influential for my research. I am particularly interested in Extended Data Fig. 5(a), which reports the imputation accuracy at INDEL sites.
To replicate the trends in your figure, I implemented the following approach:
Reference Panel Construction:
I compiled a reference population consisting of 1,602 pigs with both SNP and INDEL genotype data. This high-quality reference set served as the basis for subsequent imputation analyses.
Target Dataset Preparation:
From 30x whole-genome sequencing (WGS) data of 10 pigs, I downsampled the sequencing reads to generate datasets with coverage levels of 0.1x, 0.3x, 0.5x, 0.7x, and 1x. My objective was to evaluate how imputation accuracy at INDEL sites changes with increasing coverage.
Imputation Pipeline:
I adopted a stepwise imputation strategy:
Pre-phasing with SHAPEIT5:
I pre-phased the reference panel using SHAPEIT5 with --Ne 150 as recommended in standard guidelines.
Imputation with GLIMPSE2:
I processed the low-coverage target datasets with GLIMPSE2, using the following workflow: chunking the genome, splitting the reference, imputing each target region, and ligating the output. Throughout the pipeline, I applied recommended parameters, including Ne = 150. Additionally, I set the --call-indels parameter. Finally, I merged all imputed chunks into a consolidated dataset.
Performance Evaluation:
I evaluated the imputed genotypes at INDEL sites by comparing them against the true genotypes derived from the original 30x WGS data. The following metrics were used:
Concordance Rate: The proportion of correctly imputed genotypes.
Pearson Correlation Coefficient: Correlation between true and imputed genotypes, encoded as 0, 1, and 2.
However, the results I obtained did not reflect the increasing trend in imputation accuracy shown in your Extended Data Fig. 5(a). For example:
At 0.1x, concordance rate ≈ 0.8600, correlation ≈ 0.5956.
At 0.3x, concordance rate ≈ 0.8596, correlation ≈ 0.5812.
At 0.5x, concordance rate ≈ 0.8666, correlation ≈ 0.5597.
At 0.7x, concordance rate ≈ 0.8665, correlation ≈ 0.6004.
At 1x, concordance rate ≈ 0.8755, correlation ≈ 0.6158.
These figures do not exhibit the expected upward trend in accuracy as coverage increases, which is in contrast to the patterns illustrated in your figure.
I would be very grateful for your insights on the following points:
Parameters and Optimizations:
Were there any specific parameters or optimizations in GLIMPSE2 or the pre-phasing step that were essential for achieving the trends observed in your figure?
Preprocessing and Filtering:
Did you apply any additional preprocessing, filtering, or variant selection criteria prior to imputation, particularly for INDEL sites?
Data Thresholds and Context:
Since the original data underlying Fig. 5(a) is not fully detailed in the supplementary materials, could you kindly share any further context or thresholds critical for replicating the observed trend?
I have carefully followed standard imputation workflows, but I suspect that there may be subtle factors I have not yet considered.
Thank you very much for your time and consideration. I greatly appreciate your expertise and any advice you may offer. I look forward to your response and learning from your experience.
The text was updated successfully, but these errors were encountered:
hardworking555
changed the title
Request for Assistance with INDEL Imputation Accuracy Using GLIMPSE2
Request for Assistance with Replicating INDEL Imputation Accuracy Trends Using GLIMPSE2 from "Imputation of low-coverage sequencing data from 150,119 UK Biobank genomes"(DOI: 10.1038/s41588-023-01438-3)
Dec 23, 2024
Dear Dr. Rubinacci,
I hope this message finds you well.
Your May 2023 publication, "Imputation of low-coverage sequencing data from 150,119 UK Biobank genomes" (DOI: 10.1038/s41588-023-01438-3), has been highly influential for my research. I am particularly interested in Extended Data Fig. 5(a), which reports the imputation accuracy at INDEL sites.
To replicate the trends in your figure, I implemented the following approach:
Reference Panel Construction:
I compiled a reference population consisting of 1,602 pigs with both SNP and INDEL genotype data. This high-quality reference set served as the basis for subsequent imputation analyses.
Target Dataset Preparation:
From 30x whole-genome sequencing (WGS) data of 10 pigs, I downsampled the sequencing reads to generate datasets with coverage levels of 0.1x, 0.3x, 0.5x, 0.7x, and 1x. My objective was to evaluate how imputation accuracy at INDEL sites changes with increasing coverage.
Imputation Pipeline:
I adopted a stepwise imputation strategy:
Performance Evaluation:
I evaluated the imputed genotypes at INDEL sites by comparing them against the true genotypes derived from the original 30x WGS data. The following metrics were used:
However, the results I obtained did not reflect the increasing trend in imputation accuracy shown in your Extended Data Fig. 5(a). For example:
These figures do not exhibit the expected upward trend in accuracy as coverage increases, which is in contrast to the patterns illustrated in your figure.
I would be very grateful for your insights on the following points:
I have carefully followed standard imputation workflows, but I suspect that there may be subtle factors I have not yet considered.
Thank you very much for your time and consideration. I greatly appreciate your expertise and any advice you may offer. I look forward to your response and learning from your experience.
The text was updated successfully, but these errors were encountered: