Calculating P-sites fails #15

imilenkovic · 2021-06-29T08:41:50Z

Hello,

I tried running RiboseQC on bam files obtained with another RiboSeq analysis pipeline called riboflow.

The analysis fails at the P-site step:

Calculating P-sites offsets ... Mon Jun 28 20:39:29 2021 Calculating P-sites offsets --- Done! Mon Jun 28 20:39:29 2021 Calculating P-sites positions and junctions ... Mon Jun 28 20:39:29 2021 Calculating P-sites positions and junctions --- Done! Mon Jun 28 20:40:34 2021 Building aggregate P-sites profiles ... Mon Jun 28 20:40:34 2021 [1] "Not enough signal or low frame preference, skipped P-sites & codon occupancy calculation. \n Set 'all' in the 'choose_readlengths' choice to ignore this warning and get P-sites positions in a new run" Building aggregate P-sites profiles --- Done! Mon Jun 28 20:40:34 2021 Exporting files ... Mon Jun 28 20:40:34 2021 Error in .local(object, con, format, ...) : Scores must be non-NA numeric values Calls: RiboseQC_analysis ... export -> export -> export -> export -> export -> .local In addition: There were 12 warnings (use warnings() to see them) Execution halted

I tried adding the "readlength_choice_method = "all"" parameter in the RiboseQC_analysis, but it gave the same error and it wouldn't export any files.

I am wondering if it has to do with the annotation and reference files I was using for building the bams (https://github.com/ribosomeprofiling/references_for_riboflow/tree/master/transcriptome/mouse/v1/raw)? Or is there something else I might be doing wrong?

Thanks a lot in advance.

The text was updated successfully, but these errors were encountered:

zslastman · 2021-06-29T17:11:34Z

Hey! Thanks for reporting this. It does look like an annotation issue yes. Could you make a sample of your bam available as well so I can debug this? The reference and annotation should be exactly the same as the one used to make the bam files.

imilenkovic · 2021-06-29T19:31:16Z

Thanks a lot for your reply!

I attached a bam sample (copied the first ~20 rows).
bam_sample.txt

Regarding the annotation and the reference files:
The annotation is actually a .bed file: https://github.com/ribosomeprofiling/references_for_riboflow/tree/master/transcriptome/mouse/v1/riboflow_annot_and_ref - the one called appris_mouse_v1_actual_regions.bed. The reads were mapped to the transcriptome (but not a single .fa file - bowtie2 was used for mapping so there are 4 .bt2 files).

Let me know if this can still be used or the fastqs will need to be remapped to be compatible with this pipeline.

Thanks a lot!

zslastman · 2021-07-01T09:43:26Z

Ah ok, so, RiboseQC is actually designed to run with genomic alignments, and a gtf file - you'll need to align your reads to the genome rather than the transcriptome. Gencode gtfs should work fine for this.

imilenkovic · 2021-07-02T16:08:50Z

Hello,

Thanks for clarifying!

I remapped my reads using STAR, and these are the reference and annotation I used:

--genomeFastaFiles ~/references/GRCm38.p6.genome.fa
--sjdbGTFfile ~/RiboseQC/gencode.vM27.annotation.gtf \

And then I tried running RiboseQC by using the same reference and same annotation (only fasta file converted to 2bit with faToTwoBit):

prepare_annotation_files(annotation_directory = ".",
twobit_file = "/references/GRCm38.p6.genome.2bit",
gtf_file = "/RiboseQC/gencode.vM27.annotation.gtf", scientific_name = "Mus.musculus",
annotation_name = "vM27",export_bed_tables_TxDb = F,forge_BSgenome = T,create_TxDb = T)

but I run into the following error:

Creating the TxDb object --- Done! Fri Jul 2 17:56:03 2021
Extracting genomic regions ... Fri Jul 2 17:56:03 2021
Extracting ids and biotypes ... Fri Jul 2 17:58:32 2021
Error in loadFUN(x, seqname, ranges) :
trying to load regions beyond the boundaries of non-circular sequence
"chr4"
Calls: prepare_annotation_files ... loadSubseqsFromStrandedSequence -> loadFUN -> loadFUN
In addition: There were 50 or more warnings (use warnings() to see the first 50)
Execution halted

Any ideas what might have gone wrong? Thanks!

zslastman · 2021-07-02T16:46:38Z

No problem. That looks to me like your gtf file is for a different version of the genome than the one in the fasta (a range in chr4 is asking for a coordinate that's too high) , I would look at the gencode page to confirm you have the right one.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Calculating P-sites fails #15

Calculating P-sites fails #15

imilenkovic commented Jun 29, 2021

zslastman commented Jun 29, 2021

imilenkovic commented Jun 29, 2021

zslastman commented Jul 1, 2021

imilenkovic commented Jul 2, 2021

zslastman commented Jul 2, 2021

Calculating P-sites fails #15

Calculating P-sites fails #15

Comments

imilenkovic commented Jun 29, 2021

zslastman commented Jun 29, 2021

imilenkovic commented Jun 29, 2021

zslastman commented Jul 1, 2021

imilenkovic commented Jul 2, 2021

zslastman commented Jul 2, 2021