-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Detection and quantification of events involving novel splicing junctions #277
Comments
Regarding second question: one guess is that rMATs is counting the number of alignments (as opposed to reads). Is that right or make sense? Because the total number reported by rMATs is more than the number of reads in fasta file. |
The novelJunction events are those where all the exons are annotated in the gtf, but the combination of exons is different. In those cases rMATS can find exons and splice sites in the gtf that explain the alignment, but there is no transcript in the gtf which uses the exact sequence of exons The novelSpliceSite events use at least 1 splice site which is not in the gtf. The splice site has to be found by looking at an existing exon-exon junction in the gtf and moving the splice site subject to these parameters: --mil (minimum intron length (defaults to 50)) and --mel (maximum exon length (defaults to 500)). Basically rMATS can only use an alignment with a novel splice site if it can define a new exon by making an adjustment to an annotated exon subject to those parameter values
|
Hello Eric. EXON_NOT_MATCHED_TO_ANNOTATION: 1047715 So 1047715 + 192307 reads do not match the definition of exon and intron length and are discarded by rMATs. But rMATs is utilizing the other reads which do not match the exon and junction annotation provided they are within the defined exon/intron length limits. Also do you think the number 1047715 is too large? (it is ~1% of the total reads/alignments) |
Yes, rMATS is utilizing other reads which do not match the exon and junction annotation provided they are within the defined exon/intron length limits if --novelSS was used (--novelSS is not on by default) Also rMATS only attempts to find a novel splice site for an alignment if that alignment includes a splice junction. The JUNCTION_NOT_MATCHED_TO_ANNOTATION reads are the alignments that had a splice junction, but rMATS could not find a suitable novel splice site and so rMATS decided to discard those reads. The EXON_NOT_MATCHED_TO_ANNOTATION alignments didn't have a splice junction in their alignment, the alignment was not within an annotated exon, and rMATS discarded them without trying to find a novel splice site. The novel splice site parameters (--mil and --mel) only apply to alignments with junctions The number of alignments not matched to the annotation seems fine at 1% |
Okay. Thanks again for taking time to clarify these doubts and appreciate your help 👍 |
Enjoying the great tool, but I have another question:
One of my sample had following stats of used and unused reads (following the 2-pass alignment strategy with STAR)
read outcome totals across all BAMs
USED: 84178584
NOT_PAIRED: 3698345
NOT_NH_1: 34262716
NOT_EXPECTED_CIGAR: 373242
NOT_EXPECTED_READ_LENGTH: 2487676
NOT_EXPECTED_STRAND: 0
EXON_NOT_MATCHED_TO_ANNOTATION: 1047715
JUNCTION_NOT_MATCHED_TO_ANNOTATION: 192307
CLIPPED: 0
total: 126240585
So I have two questions:
rMATs does not use reads with EXON_NOT_MATCHED_TO_ANNOTATION and JUNCTION_NOT_MATCHED_TO_ANNOTATION condition. But since rMATs can detect and quantify splicing events involving new splicing junctions and splice sites, should not it also use those reads as these are the ones informing the new events?? Could you please solve this confusion?
Secondly, progress report from STAR says that 87.5% reads had unique mapping and 9.5% were multi mappers. Here is the snapshot
But rMATs counts 34262716 as not being NOT_NH_1: Which means nearly 34262716 / 126240585 = 27% reads are multi-mappers. Why this number NOT_NH_1 so much off the STAR calculation (~9.5% in the image)?
samtools stats BAMFILE
shows that there were 101268968 paired reads and 97749854 were properly paired. So I am happy that rMATs can use >80 percent of the properly paired reads. But just that the number 34262716 does not fit into picture with what STAR outputs.Could you please clarify these issues?
The text was updated successfully, but these errors were encountered: