-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
partition in nanopolish eventalign output #176
Comments
Hi @baibhav-bioinfo, You can combine the eventalign.txt files before you run m6anet dataprep. Thanks! Best wishes, |
if i combine the evenalign files, its 1.7 TB for a sample. if not, the whats the solution? |
The dataprep on combined.eventalign.txt did not complete in 24 hrs i have already ran the dataprep and inference of m6anet on each part of original fastq. can i just run for all parts separately and then merge the tsv output. |
Hi @baibhav-bioinfo, This is not ideal. Is it possible that you can request for longer running time on your HPC? Also, did you limit the alignment to contain only primary alignment during minimap2? That will help lower the number of sites, which should make the preprocessing faster. Thanks! Best wishes, |
actually no, there is time limit of 24 hours on jobs in TACC. also i did ran the minimap2 mapping to contain no secondary alignments. in this case what should be the solution? |
okay on some nodes i will be able to run for 48 hrs but then also i dont think it will complete some of my Direct RNA Seq samples have 20 million long nanopore reads Have you or anyone faced this issue before? or i am the first one facing the data size issue? |
Hi @baibhav-bioinfo, We don't have any machine time limit on our side. We also have PromethION samples that finished running Thanks! Best wishes, |
okay. i ran the m6anet dataprep on one of the samples for 48 hours i wonder if the eventalign.index file have completed making? as i can not see any progress in any output file generation for the last 24 hours |
also one more query about the Nanopore DRS reads if you can answer would be very helpful i wanted to ask if the DRS reads (fastq) we get after basecalling have the polyA tails in them or not? Thankyou so much for your time |
the job i ran did not finish in 48 hours also only eventalign.index file is made till now and as i mentioned it was made in only 8 hours and then for the next 40 hours no progress in it or any other file i am pasting my sbatch script setting, Kindly suggest any changes which might speed up my dataprep #!/bin/bash conda activate m6anet_python_3.8 m6anet dataprep --eventalign $SCRATCH/c6_r1.combined.eventalign.tsv --out_dir $SCRATCH/c6_r1.combined_48.m6anet_dataprep_out --n_processes 96 |
This is normal |
The poly(A) tails are still around in the fastq file in the reads. They are no longer included in starting in the eventalign file as they are mapped to a reference without poly(A) tails in the alignment step with minimap2 |
Can you show the |
$head eventalign.index $ ls -lh |
hi @yuukiiwa there is a request, if you can provide your work email....can i ask the queries there? let me know |
Unfortunately, I cannot reply to any email regarding software usage, and if I happen to receive one, I can only reply here. |
Did you delete the Can I check whether you aligned to the genome or the transcriptome(transcript cDNA)? m6Anet only supports transcriptome alignment. Thanks! Best wishes, |
Thanks for no, i did not delete
No issues, i will ask my queries here only. |
No, as i told there were no outputs made other than the eventalign.index file in the output folder |
no, i deleted nothing from the output directory. only one file was made I just realised i concatenated the tsv files for each file and the final file have the headers many times (as each part have its own header). i have removed the headers, except the top one and will run the process again, lets see if it runs this time. |
Hello, as i mentioned i removed the redundant headers from combined evenalign.txt and ran the dataprep again, now all the 4 files have been generated in the output folder but there is nothing in them. just the eventalign.index file is made in 8 hours then nothing is there in any of the four files for next 40 hours. is there anything more we can change to make it finish in 48 hours limit? ls -lh as you can see the files generated within 8 hours of running, then nothing happened for next 40 hours. (except the data,info file have headers as "transcript_id,transcript_position,start,end,n_reads")following is the command for minimap2 mapping with trascriptomeminimap2 -ax map-ont -uf -t 64 --secondary=no SbicolorRTx430_552_v2.1.transcript.fa c6_r1.fastq.gz > c6_r1.sam kindly let us know, |
I wonder if the files are filled at once when the whole job finishes? |
I did a test run by taking 1 million lines from each of the eventalign.txt parts i have so means its working My original merged file is 1700 GBs and its not finishing in 48 hours. so if i do the dataprep for each part eventalign and merge the output as i described earlier adding the read number and averaging the probabilities for each unique transcripts. |
hi, earlier i asked whether running "m6anet inference" on each part then merging to get the final result for combined sample is feasible or not. you replied its not ideal, i wanted to know is there any conceptual flaw in it ? Any help would be much appreciated. |
Hi,
i have done the partitions of the fastq file as it was not finishing for whole fastq in time.
so now i have eventalign.txt file for each part. I am using m6anet for pred the m6a sites.
what should i do? should i combine each eventalign.txt file before running dataprep or i can run the dataprep for each part an then combine the json file before inference running?
The text was updated successfully, but these errors were encountered: