Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-eukaryotic data and input files? #9

Closed
lxsteiner opened this issue Aug 14, 2023 · 4 comments · Fixed by #25
Closed

Non-eukaryotic data and input files? #9

lxsteiner opened this issue Aug 14, 2023 · 4 comments · Fixed by #25

Comments

@lxsteiner
Copy link

Hi,

This looks very useful but from all the examples it seems the package focuses mostly on eukaryotic omic data? I wanted to use this for bacterial genomes, to generate a similar plot:
image

plotting genome coverage, GC, gene annotation, and CNV annotation (without the ideogram). Would this be doable for a bacterial/viral genome and corresponding data?

Looking at the handy tutorials you provided, unfortunately I don't see how to get started from the initial files without loading existing data, or what their format should be. For example, I have:

  • (sorted/indexed) BAM file for coverage
  • GTF3 for genome annotation
  • VCF file for variant calling

The way I understand the pipeline:

  1. Prepare GTF file like this:
# prepare gtf
gtf.file = system.file("extdata", "annotation.gtf", package = "ggcoverage")
gtf.gr = rtracklayer::import.gff(con = gtf.file, format = 'gtf')
  1. Sample metadata - not sure how or if this is needed. I only have one sample in this case.
  2. Prepare BAM track data:
# load bam file
bam.file = system.file("extdata", "DNA-seq", "aligned.bam", package = "ggcoverage")
track.df <- LoadTrackFile(
  track.file = bam.file
)
  1. Load CNV file (is this just a VCF file?):
cnv.file <- system.file("extdata", "DNA-seq", "variant.vcf", package = "ggcoverage")
# read CNV
cnv.df = read.table(file = cnv.file, sep = "\t", header = TRUE)
  1. Plot basic coverage:
basic.coverage = ggcoverage(data = track.df, color = "grey", mark.region = NULL,
                            range.position = "out")
basic.coverage
  1. Plot GC, CNV annotations, gene annotations:
basic.coverage +
  geom_gc(fa.file=genome.fasta) +
  geom_cnv(cnv.df = cnv.df, bin.col = 3, cn.col = 4)
  geom_gene(gtf.gr=gtf.gr)

I guess it should look something like that? Thank you for any feedback!

@showteeth
Copy link
Owner

Hi,
It's sure that you can use ggcoverage to visualize and annotate bacterial/viral genome coverage.

  1. For a basic coverage plot, ggcoverage requires:
  1. Adding GC content annotation requires genome FASTA file to calculate GC content.

  2. Adding gene annotation requires GTF file, and the gtf file should contain "seqnames", "start", "end", "strand", "type", "gene_name" columns and one of "gene_type", "gene_biotype". Example gtf file

  3. Adding CNV annotation requires a file containing copy number results according to Genome-wide copy number analysis of single cells. You can provide a customed file with four columns: chr, position, bin value (point), copy number(line). Example cnv file

  4. Adding SNV annotation requires bam file and genome FASTA file (reference base and aa).

Hope it helps,

YB

@AceM1188
Copy link

AceM1188 commented Dec 2, 2023

This is actually something I'm looking to accomplish for plotting coverage of reads to a bacterial genome reference for one sample. From a very basic effort (as a novice), I've tried the following but am running into an error when I try to create a track dataframe from bam file.

S1_tracks = LoadTrackFile("../BaseSpace/Bacterialgenomes-402632365/FASTQ_Generation_2023-11-16_04_36_48Z-703830129/S1_bam/S1_sort.bam", track.folder = NULL, format = "bam", norm.method = "None", meta.info = sample.meta)

This generates the following error:

Calculate coverage with GenomicAlignments when norm.method is None!
Error in .io_bam(.scan_bamfile, file, reverseComplement, yieldSize(file), :
seqlevels(param) not in BAM header:
seqlevels: 'chr14'
file: ../BaseSpace/Bacterialgenomes-402632365/FASTQ_Generation_2023-11-16_04_36_48Z-703830129/S1_bam/S1_sort.bam
index: ../BaseSpace/Bacterialgenomes-402632365/FASTQ_Generation_2023-11-16_04_36_48Z-703830129/S1_bam/S1_sort.bam

I don't know where the 'chr14' is coming from particularly as my alignment is to a single bacterial genome.

Of note, here's my meta data file:
sample.meta
SampleName Type Group
1 S1 BAL Patient

Here's a text file of my session info.
session-info.txt
And here's the link to my sorted bam file: https://drive.google.com/file/d/1kf0oGu2h35F5wr2OwAse3OkoNWe5IVwb/view?usp=sharing

I'm sure this is something ridiculously simple, but I'm just not figuring it out.
Could you help me understand what the error is here?

@m-jahn
Copy link
Collaborator

m-jahn commented Mar 22, 2024

Hi, sorry for late reply, but now I looked into that issue. The problem is that the LoadTrackFile function is currently quite inflexible. Many options are hardcoded towards human data.

To make your exmaple work, you need to specify the "region" argument and your samples in the meta data file need to correspond to the name of the BAM file. This example works for me:

library(ggcoverage)
library(dplyr)

sample.meta <- data.frame(
  SampleName = "S1_sort",
  Type = "BAL",
  Group = "Patient"
)

# look for name of assembly/chromosomes in bam file
data = rtracklayer::import("~/Downloads/S1_sort.bam", format = "bam")
seqlevels(data)

# supply custom region
df_coverage = LoadTrackFile(
  track.file = "~/Downloads/S1_sort.bam",
  track.folder = NULL,
  format = "bam",
  norm.method = "None",
  meta.info = sample.meta,
  region = "NZ_CP065651.1:25,000-30,000"
)

# need to fix the end coordinate because binning is currently not
# working for non-normlaized data
df_coverage$end <- df_coverage$end + 1
ggcoverage(df_coverage, facet.key = "Type", group.key = "Group")

image

It shouldn't be that complicated, will try to fix this.

@m-jahn
Copy link
Collaborator

m-jahn commented Apr 13, 2024

Will be fixed with commit 8c537bb

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants