Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calculating P-sites fails #15

Open
imilenkovic opened this issue Jun 29, 2021 · 5 comments
Open

Calculating P-sites fails #15

imilenkovic opened this issue Jun 29, 2021 · 5 comments

Comments

@imilenkovic
Copy link

Hello,

I tried running RiboseQC on bam files obtained with another RiboSeq analysis pipeline called riboflow.

The analysis fails at the P-site step:

Calculating P-sites offsets ... Mon Jun 28 20:39:29 2021 Calculating P-sites offsets --- Done! Mon Jun 28 20:39:29 2021 Calculating P-sites positions and junctions ... Mon Jun 28 20:39:29 2021 Calculating P-sites positions and junctions --- Done! Mon Jun 28 20:40:34 2021 Building aggregate P-sites profiles ... Mon Jun 28 20:40:34 2021 [1] "Not enough signal or low frame preference, skipped P-sites & codon occupancy calculation. \n Set 'all' in the 'choose_readlengths' choice to ignore this warning and get P-sites positions in a new run" Building aggregate P-sites profiles --- Done! Mon Jun 28 20:40:34 2021 Exporting files ... Mon Jun 28 20:40:34 2021 Error in .local(object, con, format, ...) : Scores must be non-NA numeric values Calls: RiboseQC_analysis ... export -> export -> export -> export -> export -> .local In addition: There were 12 warnings (use warnings() to see them) Execution halted

I tried adding the "readlength_choice_method = "all"" parameter in the RiboseQC_analysis, but it gave the same error and it wouldn't export any files.

I am wondering if it has to do with the annotation and reference files I was using for building the bams (https://github.com/ribosomeprofiling/references_for_riboflow/tree/master/transcriptome/mouse/v1/raw)? Or is there something else I might be doing wrong?

Thanks a lot in advance.

@zslastman
Copy link
Collaborator

Hey! Thanks for reporting this. It does look like an annotation issue yes. Could you make a sample of your bam available as well so I can debug this? The reference and annotation should be exactly the same as the one used to make the bam files.

@imilenkovic
Copy link
Author

Thanks a lot for your reply!

I attached a bam sample (copied the first ~20 rows).
bam_sample.txt

Regarding the annotation and the reference files:
The annotation is actually a .bed file: https://github.com/ribosomeprofiling/references_for_riboflow/tree/master/transcriptome/mouse/v1/riboflow_annot_and_ref - the one called appris_mouse_v1_actual_regions.bed. The reads were mapped to the transcriptome (but not a single .fa file - bowtie2 was used for mapping so there are 4 .bt2 files).

Let me know if this can still be used or the fastqs will need to be remapped to be compatible with this pipeline.

Thanks a lot!

@zslastman
Copy link
Collaborator

Ah ok, so, RiboseQC is actually designed to run with genomic alignments, and a gtf file - you'll need to align your reads to the genome rather than the transcriptome. Gencode gtfs should work fine for this.

@imilenkovic
Copy link
Author

Hello,

Thanks for clarifying!

I remapped my reads using STAR, and these are the reference and annotation I used:

--genomeFastaFiles ~/references/GRCm38.p6.genome.fa
--sjdbGTFfile ~/RiboseQC/gencode.vM27.annotation.gtf \

And then I tried running RiboseQC by using the same reference and same annotation (only fasta file converted to 2bit with faToTwoBit):

prepare_annotation_files(annotation_directory = ".",
twobit_file = "/references/GRCm38.p6.genome.2bit",
gtf_file = "
/RiboseQC/gencode.vM27.annotation.gtf", scientific_name = "Mus.musculus",
annotation_name = "vM27",export_bed_tables_TxDb = F,forge_BSgenome = T,create_TxDb = T)

but I run into the following error:

Creating the TxDb object --- Done! Fri Jul 2 17:56:03 2021
Extracting genomic regions ... Fri Jul 2 17:56:03 2021
Extracting ids and biotypes ... Fri Jul 2 17:58:32 2021
Error in loadFUN(x, seqname, ranges) :
trying to load regions beyond the boundaries of non-circular sequence
"chr4"
Calls: prepare_annotation_files ... loadSubseqsFromStrandedSequence -> loadFUN -> loadFUN
In addition: There were 50 or more warnings (use warnings() to see the first 50)
Execution halted

Any ideas what might have gone wrong? Thanks!

@zslastman
Copy link
Collaborator

No problem. That looks to me like your gtf file is for a different version of the genome than the one in the fasta (a range in chr4 is asking for a coordinate that's too high) , I would look at the gencode page to confirm you have the right one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants