Gene_Centic_Coding Unable to Analyze Gene #9

kwdoyle · 2023-10-24T17:32:29Z

Hello,

While running theGene_Centic_Coding function, I noticed a strange issue while processing through a list of genes for a specific chromosome.

On any given gene, the function seems to work properly until the internal coding function attempts to run the STAAR function:

try(pvalues <- STAAR(Geno, obj_nullmodel, Anno.Int.PHRED.sub.category, 
    rare_maf_cutoff = rare_maf_cutoff, rv_num_cutoff = rv_num_cutoff), 
    silent = silent)

I am receiving the following error, and thus no results from the current gene:

Error in STAAR(Geno, obj_nullmodel, Anno.Int.PHRED.sub.category, rare_maf_cutoff = rare_maf_cutoff,  :     Dimensions don't match for genotype and annotation!

This error occurs virtually for all genes. Looking into this, it appears the issue is how the annotation data is subset for the final list of variants that are lof in plof:

Anno.Int.PHRED.sub.category <- Anno.Int.PHRED.sub[lof.in.plof, ]

When I run this, lof.in.plof is a vector of NAs, TRUEs, and FALSEs, with the number of TRUEs corresponding to the final filtered number of variants to use (in my case, 5). When the annotation data in Anno.Int.PHRED.sub is subset using this vector, however, the final dimensions of the table still contain the number of rows that correspond to the previous number of variants (which, in my case, was 129).

The Geno matrix has the dimensions [n samples x 5 variants]. When Anno.Int.PHRED.sub.category is passed to the STAAR function, however, its dimensions are still [n samples x 129 variants], causing the error.

If I wrap the which function around lof.in.plof, the dimensions of the resulting table are [n samples x 5] and STAAR is able to run properly and gives no error:

Anno.Int.PHRED.sub.category <- Anno.Int.PHRED.sub[which(lof.in.plof),]

I assume this fix makes sense and there shouldn't be a reason Anno.Int.PHRED.sub.category should still contain rows with NA data..? The final dimensions of this annotation table should indeed match that of the genotype matrix, no?

The text was updated successfully, but these errors were encountered:

xihaoli · 2023-10-24T17:40:00Z

Hi @kwdoyle,

Thanks for your questions. I haven't encountered this issue before, since I don't think lof.in.plof should contain NA for any variant based on its definition. If you want to delve deeper, I would recommend you re-annotate your genotype data using the FAVOR Essential Database and see if this persists.

Best,
Xihao

xihaoli · 2023-10-24T17:42:14Z

On another note, what are those variants with lof.in.plof being NA in your dataset? Are these SNVs or indels? Feel free to send an example or two for such variants.

Thank you!

kwdoyle · 2023-10-26T14:58:57Z

So all variants do indeed have a value for lof.in.coding, which is reassuring. It seems like the NA issue in lof.in.plof is because not all variants have a value for MetaSVM_pred, which is used in its creation. A few, for example, would be 5-1253777-G,T 5-1253795-G,A 5-1253804-G,A. For chromosome 5, at least, all are SNVs.

kwdoyle · 2023-10-26T15:08:22Z

I think this is another data-loading issue. MetaSVM should be either "T" or "D", which it is in the merged csv, yet the values are "TRUE" and "" within the annotated GDS. I'll have to check why this is happening.

xihaoli · 2023-10-26T15:13:06Z

Hi @kwdoyle,

Thanks so much for your input. The issue might still be related to auto-assign the column classes. Please feel free to send me an email and if you'd like we can do a quick call to get it right.

Best,
Xihao

xihaoli · 2023-10-26T16:51:24Z

In brief, if you annotate your genotype data using the FAVOR Essential Database, this issue should not persist.

kwdoyle · 2023-10-26T18:37:05Z

Yes, so this issue was due to read_csv in gds2agds.r assigning the wrong column class to MetaSVM_pred. The "T" or "D" values were read in as logicals, converting any "T"s to TRUEs and the "D"s to NAs. This incorrect data was then used to annotate the GDS.

Since I chose to add all 160 annotations from the FAVOR database, it would have been inconvenient to assign the column classes for each one within read_csv. I instead used data.table::fread to read in the data, which correctly assigned all column classes.

For reference, I checked which columns were read in differently between read_csv and fread:

Anything read in by read_csv as logical would potentially be a problem, and fread does correctly read those in instead.

xihaoli · 2023-10-26T18:46:39Z

Thank you @kwdoyle. This is very helpful!

If you would like to contribute some documents/scripts that you use, please let me know. I can add you as a collaborator of the STAARpipeline-Tutorial repo so that you can contribute to this section.

Best,
Xihao

kwdoyle · 2023-10-27T19:10:56Z

That would be great, as I've been making some modifications to these scripts to be generally applicable to other scenarios. Mainly, being independent from the Harvard cluster job IDs used to select the current chromosome to analyze.

xihaoli · 2023-10-27T19:38:03Z

Sounds perfect, thank you @kwdoyle! I've invited you to be part of the STAARpipeline-Tutorial repo. Look forward to your contributions!

p.s. I'll close this issue and the other issue in the STAARpipeline-Tutorial repo.

Best,
Xihao

xihaoli closed this as completed Oct 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gene_Centic_Coding Unable to Analyze Gene #9

Gene_Centic_Coding Unable to Analyze Gene #9

kwdoyle commented Oct 24, 2023

xihaoli commented Oct 24, 2023

xihaoli commented Oct 24, 2023

kwdoyle commented Oct 26, 2023

kwdoyle commented Oct 26, 2023

xihaoli commented Oct 26, 2023

xihaoli commented Oct 26, 2023

kwdoyle commented Oct 26, 2023

xihaoli commented Oct 26, 2023

kwdoyle commented Oct 27, 2023

xihaoli commented Oct 27, 2023

Gene_Centic_Coding Unable to Analyze Gene #9

Gene_Centic_Coding Unable to Analyze Gene #9

Comments

kwdoyle commented Oct 24, 2023

xihaoli commented Oct 24, 2023

xihaoli commented Oct 24, 2023

kwdoyle commented Oct 26, 2023

kwdoyle commented Oct 26, 2023

xihaoli commented Oct 26, 2023

xihaoli commented Oct 26, 2023

kwdoyle commented Oct 26, 2023

xihaoli commented Oct 26, 2023

kwdoyle commented Oct 27, 2023

xihaoli commented Oct 27, 2023