-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ref to signal mapping did not consider splicing #118
Comments
Remora is not currently designed to work with spliced RNA alignments. This is not currently on the roadmap, but is certainly possible. The fix would be to this function primarily. I'd be happy to accept a PR for this, but need to ensure that it would not have any impact on standard genomic alignments. |
Okay, I misunderstood something before but updated the understanding later on, I will open a PR soon. I wrote a function to get the query to reference mapping, and modified something to get the query to ref coordinates mappings -> this leads to a few differences from the original ref_to_signal values (for genomic reads) but they are more or less similar... for example if I just it works for my spliced read mentioned above: please review my PR, thank you very much! |
I'm on paternity leave at the moment, so I won't be able to take a very close look at this for a little bit, but at a high level unless there is a clear reason I think any differences in produced genomic alignments will not be acceptable. If differences in genomic alignments are required to support the skip operation could you detail those here? |
Got it, thanks @marcus1487. The differences come from map_ref_to_signal()
if stick to it, the However for spliced reads, it doesn't generate the correct mapping as the
to replace One way I could think of is to first determine if a read has intron or not, then based on that choose to use |
@marcus1487 I found out I don't need another interpolation in the spliced read's ref_to_signal is now like I then tested the genomic/unspliced reads I have (I wrapped the steps into matches = []
num_processed_reads = 0
for bam_read in bam_reads_reg:
try:
mapping = get_manual_mappins(bam_read, pod5_reads[bam_read.query_name])
num_processed_reads += 1
except:
continue
io_read = Read.from_pod5_and_alignment(
pod5_read_record=pod5_reads[bam_read.query_name],
alignment_record=bam_read,
reverse_signal=True,
)
matches.append((mapping == io_read.ref_to_signal).all())
the results are all the same! sorry for the mess... all the stuff is a bit complicated. I'm still in awe that people like you can create such complex software, I am going to update the PR now |
It seems that the core issue here has been resolved. Please re-open if you experience further issues. |
Hi @marcus1487, I was using remora API on real direct RNA-seq data, but ran into errors from this command:
Apparently
ref_seq:2542
is the number of bases of this gene's exonic regions (i.e. seq length of this read), andmove+cigar:16978
~= ref_end - ref_start on the genomic region (i.e. the range of all exons and introns).I then tried to resolve it, for here if changes to
will get the correct lengths as it uses reference knots and query knots to generate reference to read knots. Then for map_ref_to_signal() I guess can just
then it gives
ref_to_signal
coordinate mapping.I haven't tested it further, but I can provide a toy dataset for this issue, thank you and looking forward to your comments.
The text was updated successfully, but these errors were encountered: