Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

passing bam files to sc_long_multisample_pipeline #48

Open
sparthib opened this issue Sep 29, 2024 · 8 comments
Open

passing bam files to sc_long_multisample_pipeline #48

sparthib opened this issue Sep 29, 2024 · 8 comments

Comments

@sparthib
Copy link

Hi there,

I have generated BAM files for all my samples manually using minimap2, what would be the format to pass these ontosc_long_multisample_pipeline? I see options listed for passing the FASTQ files to the function in the documentation, is it similar for the bams?

Thanks,
Sowmya

@ChangqingW
Copy link
Collaborator

You can copy / symlink the BAM files to the output folder, name them as [corresponding_fastq_file_name]_align2genome.bam, e.g. if you have sample1.fastq and sample2.fastq, then put sample1_align2genome.bam and sample2_align2genome.bam should make FLAMES skip the alignment step and use the provided BAM.
This will also work for realignment (sample1_realign2transcript.bam).

@sparthib
Copy link
Author

sparthib commented Oct 4, 2024

great thanks!

@sparthib sparthib closed this as completed Oct 4, 2024
@sparthib
Copy link
Author

sparthib commented Oct 7, 2024

is there a similar multi-sample pipeline for bulk samples? Thanks!

Sowmya

@sparthib sparthib reopened this Oct 7, 2024
@ChangqingW
Copy link
Collaborator

For now, you could put all FASTQs into one folder and provide the path to the folder, each FASTQ file would be considered a sample and the [corresponding_fastq_file_name]_align2genome.bam file could skip the alignment step.
The plan is to make this the same as the sc_long_multisample_pipeline in the devel branch where you could simply provide a named vector, where values could be path to folder or file:

sc_long_multisample_pipeline(
fastqs = c(
  "sample1" = file.path(outdir, "fastq", "your_fq_folder_for_1st_sample"),
  "sample2" = file.path(outdir, "fastq", "your_second_fq.fq.gz"),
  "sample3" = file.path(outdir, "fastq", "third.fq.gz")),
...
)

And then the names would be used as sample names

@sparthib
Copy link
Author

sparthib commented Oct 9, 2024

just for clarification, are you saying sc_long_multisample_pipeline can be used for bulk samples as well?

thanks,
Sowmya

@ChangqingW
Copy link
Collaborator

Sorry, I meant that bulk_long_pipeline can handle multiple samples, but you need to put all FASTQs into one folder and provide the path to the folder, each FASTQ file would be considered a sample.

@sparthib
Copy link
Author

I tried something like this:

fastq_path = "path_to_input_fastqs"
fastqs <- list.files(fastq_path, full.names = TRUE)

names(fastqs) <- c("sample1", "sample2" ...)

se <- bulk_long_pipeline(
  annotation = gtf, 
  fastq= fastqs,
  outdir = outdir,
  genome_fa = genome_fa, config_file = config_file
)

and it returns an error like this

Error in if (utils::file_test("-d", fastq)) { : 
  the condition has length > 1
Calls: bulk_long_pipeline

I believe this has to do with passing multiple fastq files to the fastq argument but let me know if I'm wrong.

Thanks,
Sowmya

@ChangqingW
Copy link
Collaborator

ChangqingW commented Oct 10, 2024

For now you could use

fastq_path = "path_to_input_fastqs"
# fastqs <- list.files(fastq_path, full.names = TRUE)
# names(fastqs) <- c("sample1", "sample2" ...)

se <- bulk_long_pipeline(
  annotation = gtf, 
  fastq= fastq_path,
  outdir = outdir,
  genome_fa = genome_fa, config_file = config_file
)

each FASTQ file would then be considered as a sample.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants