Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error in Viral_Track_scanning.R #23

Open
Shiywa opened this issue Aug 3, 2022 · 51 comments
Open

error in Viral_Track_scanning.R #23

Shiywa opened this issue Aug 3, 2022 · 51 comments

Comments

@Shiywa
Copy link

Shiywa commented Aug 3, 2022

Hi,
thanks for your work, I just hava a question about the Viral_Track_scanning.R. If there is no any mapping of virus, will it report an error like following?

Loading of the libraries.... ... done ! Warning message:
Output directory does not exist ! Creating it ! 
1 Fastq files are going to be processed ! 
Mapping step2_hgmm_100_R2_extracted.fastq file 
Aug 03 17:58:15 ..... started STAR run
Aug 03 17:58:15 ..... loading genome
Aug 03 17:58:26 ..... started 1st pass mapping
Aug 03 17:58:46 ..... finished 1st pass mapping
Aug 03 17:58:46 ..... inserting junctions into the genome indices
Aug 03 17:59:53 ..... started mapping
Aug 03 18:00:13 ..... finished mapping
Aug 03 18:00:16 ..... started sorting BAM
Aug 03 18:00:30 ..... finished successfully
Mapping ofstep2_hgmm_100_R2_extracted.fastq done ! 
All fastq files have been mapped successfully 
Starting the BAM file analysis 
Indexing of the bam file for step2_hgmm_100_R2_extracted is done 
Computing stat file for the bam file for step2_hgmm_100_R2_extracted is done 
Checking the mapping quality of each virus... 
Export of the viral SAM file done for step2_hgmm_100_R2_extracted 
Error in { : task 1 failed - "the condition has length > 1"
Calls: %dopar% -> <Anonymous>
@GhobrialMoheb
Copy link

GhobrialMoheb commented Aug 3, 2022

Hi I can't run "Viral_Track_scanning.R", did you run it in windows or linux R?

I get this error message, despite having Biostrings installed
Loading of the libraries.... Error in library(Biostrings) : there is no package called ‘Biostrings’
Calls: suppressMessages -> withCallingHandlers -> library
Execution halted

@Shiywa
Copy link
Author

Shiywa commented Aug 3, 2022

I don't have any error about the packages.

@GhobrialMoheb
Copy link

Thanks for your reply, may you please guide me as to how you made it work to this step?

@Shiywa
Copy link
Author

Shiywa commented Aug 3, 2022

Thanks for your reply, may you please guide me as to how you made it work to this step?

sorry, I haven't finished a successful test now. I used it firstly today.

@GhobrialMoheb
Copy link

GhobrialMoheb commented Aug 3, 2022

This is my status now:

Export of the viral SAM file done for hgmm_100_R2_extracted
Error in { : task 1 failed - "different row counts implied by arguments"
Calls: %dopar% ->
In addition: Warning messages:
1: In dir.create(paste(k, "Viral_BAM_files", sep = "")) :
'/mnt/c/Users/mohebg/Desktop//Viral_Track/Test_COVID//hgmm_100_R2_extracted/Viral_BAM_files' already exists
2: executing %dopar% sequentially: no parallel backend registered
Execution halted

@Shiywa, same as you.
Did you find a solution?

@zorglubz-coder
Copy link

Hi,
I do have the exact same issue as @Shiywa .
After some understanding of the code, it seems that the if statment line 343 of Viral_Track_scanning.R is at fault.

if (class(Viral_reads_contents)=="numeric") { Viral_reads_contents = matrix(Viral_reads_contents_mean,ncol = 4) }

The class of Viral_reads_contents is "matrix" "array" which leads to the issue observed since if only wants one class.
This code seems useless since the variable "Viral_reads_contents_mean" is not defined at this step.

I would suggest that you just delete this part, and try to run again. (I am currently running it, I hope this solved the issue)

From my understanding of the code, you can already see if you have virus mapped at this step if you look at the file
output_folder/fasta_file_name_extracted/Count_chromosomes.txt

Please let me know if this helps.

@zorglubz-coder
Copy link

Hi @GhobrialMoheb,

From my experience you should delete everything from your output folder starting from scratch every time. (remove unsuccessful attempt to run the viral track).

And I am running everything on Ubuntu. (you can easily install Ubuntu on windows 10), but I have no idea why you have issue.

@mohebg
Copy link

mohebg commented Aug 9, 2022

@zorglubz-coder , thanks for the comment.

The issue I have realized is that "BAM_file@elementMetadata$seq" at step 338 yields NULL.

Yes, I do clear the output folder.
I also use Ubuntu in Windows.

May I ask you, which FASTQ files you are doing the trial on? Did your trial work?

@zorglubz-coder
Copy link

I did not do a trial, and I never finished one analysis of my data, so I am by no mean an expert.

Are you sure you are in the foreach loop? From what I understand the foreach loop create its own environment, so you cannot access it once it crashed. This would explain the NULL you have

I can only suggest that you try to study everything in R studio in windows (at least that is how I did). You just have to install and load the packages. Export one bam file present in the output_folder/fasta_name_extracted/Viral_BAM_files/viral_name.bam, and try to run

BAM_file= readGAlignments(paste("path_to_bam"),param = ScanBamParam(what =scanBamWhat()))
Viral_reads = unique(BAM_file@elementMetadata$seq)
Viral_reads_contents = alphabetFrequency(Viral_reads,as.prob =T )
Viral_reads_contents = Viral_reads_contents[,c("A","C","G","T")]

###keep going until you find the issue

@GhobrialMoheb
Copy link

I actually did yes a manual run, and the issue for me was at the level of the BAM file, it lacks "elementMetadata$seq"

@zorglubz-coder
Copy link

Is it only for a specific bam file, or all of them?

@GhobrialMoheb
Copy link

I did multiple trials on different FASTQ files, and the issue is the same throughout.

I am not sure though if there is a FASTQ file that is for sure good to use as a control to make sure the script runs smoothly

@zorglubz-coder
Copy link

I am sorry if you did that already, but I need to be sure.
When you did a manual run, did you run exactly these commands, on a single bam file out of any foreach loop (i.e. not in the continuity of the script)?
The bam file are generated with previous steps of the Viral_Track script.

BAM_file= readGAlignments(paste("path_to_bam"),param = ScanBamParam(what =scanBamWhat()))
Viral_reads = unique(BAM_file@elementMetadata$seq)
Viral_reads_contents = alphabetFrequency(Viral_reads,as.prob =T )
Viral_reads_contents = Viral_reads_contents[,c("A","C","G","T")]

###keep going until you find the issue

@GhobrialMoheb
Copy link

yes, I do it for a single BAM file

@zorglubz-coder
Copy link

Honestly I have not idea about anything at this point. I don't knwo really how it works.
Here is a bam file that is working for me, can you run the previous code on it?
Can you try on this file (just unzip it).
refseq_NC_031338_10056nt_Moku.zip

@mohebg
Copy link

mohebg commented Aug 9, 2022

image

@mohebg
Copy link

mohebg commented Aug 9, 2022

here is your BAM file, I still don't see content of element metadata (as it is the case for my runs)

@mohebg
Copy link

mohebg commented Aug 9, 2022

Did you get QC reports in the end ?

@zorglubz-coder
Copy link

I am puzzled here, could you send a picture of the code you ran?

@zorglubz-coder
Copy link

Untitled

@zorglubz-coder
Copy link

Did you get QC reports in the end ?

I am still currently running it after solving the issue mentioned earlier
#23 (comment)

@mohebg
Copy link

mohebg commented Aug 9, 2022

I am puzzled here, could you send a picture of the code you ran?

BAM_file= readGAlignments("refseq_NC_031338_10056nt_Moku.bam")

@mohebg
Copy link

mohebg commented Aug 9, 2022

Untitled

I get an error when I run:
BAM_file= readGAlignments("refseq_NC_031338_10056nt_Moku.bam",param = ScanBamParam(what =scanBamWhat()))

image

@mohebg
Copy link

mohebg commented Aug 9, 2022

I get an error when I run:
BAM_file= readGAlignments("refseq_NC_031338_10056nt_Moku.bam",param = ScanBamParam(what =scanBamWhat()))

@mohebg
Copy link

mohebg commented Aug 9, 2022

How is it possible, that I get this error while you don't, when it is the same BAM file ?

@zorglubz-coder
Copy link

Well...
I have not a single idea apart from deleting (all the packages named bellow) and reinstalling the packages suggested in the installation page.

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install(version = "3.10") 
BiocManager::install(c("Biostrings", "ShortRead","doParallel","GenomicAlignments","Gviz","GenomicFeatures","Rsubread"))

@zorglubz-coder
Copy link

If you have an error in the package installation, then this is where the issue is.

@mohebg
Copy link

mohebg commented Aug 9, 2022

All the packages are installed, I will reinstall them and see

@zorglubz-coder
Copy link

Did you get QC reports in the end ?

I did it!
I hope we can solve your issue too.

@GhobrialMoheb
Copy link

great, thanks alot

@GhobrialMoheb
Copy link

May I ask you some questions:

  • Are you able to share the FASTQ file you used? I need a positive control.
  • Can you send me a screenshot of the contents of the index folder, that is specified in the Parameters.txt file?
    (is is the directory of the output of:
    "STAR --runThreadN N --runMode genomeGenerate --genomeDir /path/to/index --genomeFastaFiles /path/to/Virusite_file.fa path/to/Host_genome_chromosome*.fa" ?

@mohebg
Copy link

mohebg commented Aug 9, 2022

Also, the BAM file you sent me is 424 Kb, is a bit small - just wanted to be sure is the correct one?
Also it's name seems shorter than the screenshot you sent - thus I am asking

@zorglubz-coder
Copy link

I am sorry but I am not comfortable sharing the FASTQ file.
here is the output file populated. All the hidden text is the name of the fastq file

Untitled

@GhobrialMoheb
Copy link

"_Aligned.sortedByCoord.out.bam" is the BAM file to load in the function:
BAM_file= readGAlignments("XXX.bam")

@zorglubz-coder
Copy link

Also, the BAM file you sent me is 424 Kb, is a bit small - just wanted to be sure is the correct one? Also it's name seems shorter than the screenshot you sent - thus I am asking

Yes that is the correct size for this bam file, and I just provided the full path to the file so it is longer.
The issue is with the error you have when you add the option ScanBamParam(what =scanBamWhat()). This option does not gives me an error.

@GhobrialMoheb
Copy link

I see, I don't know why it give me this error

@zorglubz-coder
Copy link

"_Aligned.sortedByCoord.out.bam" is the BAM file to load in the function: BAM_file= readGAlignments("XXX.bam")

No it is the bam files present in Viral_BAM_files see the line bellow form the Viral_track_file

BAM_file= readGAlignments(paste(k,"Viral_BAM_files/",i,".bam",sep = ""),param = ScanBamParam(what =scanBamWhat()))

@mohebg
Copy link

mohebg commented Aug 9, 2022

Hi, I do have the exact same issue as @Shiywa . After some understanding of the code, it seems that the if statment line 343 of Viral_Track_scanning.R is at fault.

if (class(Viral_reads_contents)=="numeric") { Viral_reads_contents = matrix(Viral_reads_contents_mean,ncol = 4) }

The class of Viral_reads_contents is "matrix" "array" which leads to the issue observed since if only wants one class. This code seems useless since the variable "Viral_reads_contents_mean" is not defined at this step.

I would suggest that you just delete this part, and try to run again. (I am currently running it, I hope this solved the issue)

From my understanding of the code, you can already see if you have virus mapped at this step if you look at the file output_folder/fasta_file_name_extracted/Count_chromosomes.txt

Please let me know if this helps.

Did this removal of the line help?

@mohebg
Copy link

mohebg commented Aug 9, 2022

@zorglubz-coder , now I could successfully load the BAM file and get the viral reads:

image

However I get this error:

Export of the viral SAM file done for SRR11616442
Error in colnames<-(*tmp*, value = c("N_reads", "N_unique_reads", :
attempt to set 'colnames' on an object with less than two dimensions
Calls: colnames<- -> colnames<-
Execution halted

@zorglubz-coder
Copy link

Hi, I do have the exact same issue as @Shiywa . After some understanding of the code, it seems that the if statment line 343 of Viral_Track_scanning.R is at fault.
if (class(Viral_reads_contents)=="numeric") { Viral_reads_contents = matrix(Viral_reads_contents_mean,ncol = 4) }
The class of Viral_reads_contents is "matrix" "array" which leads to the issue observed since if only wants one class. This code seems useless since the variable "Viral_reads_contents_mean" is not defined at this step.
I would suggest that you just delete this part, and try to run again. (I am currently running it, I hope this solved the issue)
From my understanding of the code, you can already see if you have virus mapped at this step if you look at the file output_folder/fasta_file_name_extracted/Count_chromosomes.txt
Please let me know if this helps.

Did this removal of the line help?

Yes, I have no more issue without the mentioned lines

@zorglubz-coder
Copy link

@zorglubz-coder , now I could successfully load the BAM file and get the viral reads:

image

However I get this error:

Export of the viral SAM file done for SRR11616442 Error in colnames<-(*tmp*, value = c("N_reads", "N_unique_reads", : attempt to set 'colnames' on an object with less than two dimensions Calls: colnames<- -> colnames<- Execution halted

Are you still trying to run it manually? I had this issue also when I ran it manually, but running the Viral_Track_scanning.R entirely (with the if line 343 removed) worked just fine.

@mohebg
Copy link

mohebg commented Aug 9, 2022

@zorglubz-coder , now I could successfully load the BAM file and get the viral reads:
image
However I get this error:
Export of the viral SAM file done for SRR11616442 Error in colnames<-(*tmp*, value = c("N_reads", "N_unique_reads", : attempt to set 'colnames' on an object with less than two dimensions Calls: colnames<- -> colnames<- Execution halted

Are you still trying to run it manually? I had this issue also when I ran it manually, but running the Viral_Track_scanning.R entirely (with the if line 343 removed) worked just fine.

yes, I have this issue with the Viral_Track_scanning.R run

@mohebg
Copy link

mohebg commented Aug 9, 2022

image
did you experience this before ?

@zorglubz-coder
Copy link

@zorglubz-coder , now I could successfully load the BAM file and get the viral reads:
image
However I get this error:
Export of the viral SAM file done for SRR11616442 Error in colnames<-(*tmp*, value = c("N_reads", "N_unique_reads", : attempt to set 'colnames' on an object with less than two dimensions Calls: colnames<- -> colnames<- Execution halted

Are you still trying to run it manually? I had this issue also when I ran it manually, but running the Viral_Track_scanning.R entirely (with the if line 343 removed) worked just fine.

yes, I have this issue with the Viral_Track_scanning.R run

I am sorry but I cannot help you anymore on this issue

@zorglubz-coder
Copy link

image did you experience this before ?

I never ran these lines in manual so I don't think I had this error.

One last recommendation, if the package re installation changed some stuff, maybe start everything from scratch:
Creation of the Index and of the annotation file
Pre-processing of the single data
Detection of viruses in scRNA-seq data

I will not be able to help you anymore, I hope you can figure it out.

@mohebg
Copy link

mohebg commented Aug 9, 2022

Thanks alot for your help

@mohebg
Copy link

mohebg commented Aug 9, 2022

QC_report.pdf

@mohebg
Copy link

mohebg commented Aug 9, 2022

image
it seems that I advanced a bit, I got the QC report:
image

@zorglubz-coder
Copy link

nice, good luck for the rest of you analysis

@Shiywa
Copy link
Author

Shiywa commented Aug 10, 2022

Hi, I do have the exact same issue as @Shiywa . After some understanding of the code, it seems that the if statment line 343 of Viral_Track_scanning.R is at fault.

if (class(Viral_reads_contents)=="numeric") { Viral_reads_contents = matrix(Viral_reads_contents_mean,ncol = 4) }

The class of Viral_reads_contents is "matrix" "array" which leads to the issue observed since if only wants one class. This code seems useless since the variable "Viral_reads_contents_mean" is not defined at this step.

I would suggest that you just delete this part, and try to run again. (I am currently running it, I hope this solved the issue)

From my understanding of the code, you can already see if you have virus mapped at this step if you look at the file output_folder/fasta_file_name_extracted/Count_chromosomes.txt

Please let me know if this helps.

Thanks for your suggestion. I have solved the problem by setting class(Viral_reads_contents) as class(Viral_reads_contents)[1]. that seems due to the version class.

@zlz6621299
Copy link

zlz6621299 commented Apr 6, 2024

image it seems that I advanced a bit, I got the QC report: image

Hello, I was wondering if you have successfully run your code for COVID-19? When I aligned to the BAM file, I didn't get any correspondence for COVID-19. I've tried several datasets but still can't generate it successfully. If you have tried it, could you please send me a copy of your FASTA file (the SARS-CoV-2 reference genome)? I would like to try again after generating an index with STAR.
SARS_count.txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants