Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do you have the plan to add the pileup functions? #51

Open
panxiaoguang opened this issue Sep 26, 2022 · 8 comments
Open

Do you have the plan to add the pileup functions? #51

panxiaoguang opened this issue Sep 26, 2022 · 8 comments
Labels
enhancement New feature or request

Comments

@panxiaoguang
Copy link

Sometimes,We want to iter bam according to the position not record. So do you have the plan to add the function.

@CiaranOMara
Copy link
Member

The coverage method in GenomicFeatures.jl will give you a pileup.

There is a plan to make this easier to query intersection and overlaps with GenomicFeatures v3, which will use the groupname method instead of the inconsistent seqname method.

@panxiaoguang
Copy link
Author

The coverage method in GenomicFeatures.jl will give you a pileup.

There is a plan to make this easier to query intersection and overlaps with GenomicFeatures v3, which will use the groupname method instead of the inconsistent seqname method.

However, the coverage method can only get the base numbers in some collections of regions. We use the pileup method to get, for example,how many bases are A, G , C, and T ? How many mismatches or indels ? from the first position to the last position .Yes,we need an easy API like BAM Record.

@CiaranOMara
Copy link
Member

Oh, I assume you are referring to the Pileup format. Would you provide some references to the functionality/features you are requesting, and we'll see if adding them to this package makes sense?

@panxiaoguang
Copy link
Author

Oh, I assume you are referring to the Pileup format. Would you provide some references to the functionality/features you are requesting, and we'll see if adding them to this package makes sense?

A good example is python class pipeup, PileupColumn and PileupRead(https://pysam.readthedocs.io/en/latest/api.html#pysam.PileupColumn), But they are using cython, I'm not familiar with that.

@CiaranOMara CiaranOMara added the enhancement New feature or request label Oct 9, 2022
@MillironX
Copy link
Member

I agree this would be extremely useful. I don't think we would need to support the pileup format, per se, but provide a pileup iterator. Basically, a pileup iterator on a SAM.Reader or BAM.Reader should return a new pileup object, something like

@kwdef struct Pileup
    chr
    pos
    ref = 'N'
    count = 0 
    reads::Vector = []
    mapquals::Vector = []
    basequals::Vector = []
end #struct

So iterating through a sorted BAM should return a set like

[
    Pileup("chr1", 1),
    Pileup("chr1", 2),
    Pileup("chr1", 3),
    Pileup("chr2", 1),
    Pileup("chr2", 2),
]

@CiaranOMara
Copy link
Member

I think part of the ideal solution is to further develop GenomicFeatures.jl/tree/pu/v3.0.0's GenomicIntervalCollection and its overlap iterators. Current development of the overlap iterators work with mixed subtypes of AbstractGenomicInterval, but this restriction could be removed to allow GenomicFeatures's iterators and collections to work directly with XAM records.

@jonathanBieler
Copy link
Contributor

jonathanBieler commented Oct 13, 2022

I think it would be good to have something like this but it would be even better if it could be made generic enough to accodate things like grouping reads pairs into fragments, e.g if you have to overlapping reads :

R1 : ATTGCGC-----
R2 : ---GTGCATTC

A read-wise pileup at position 5 would return [C = 1 , T = 1] while a fragment-wise at position 4 would return [G = 1] while at position 5 [N=1] or [C = 1 if quality of R1 > quality of R2].

Same things for groups of PCR duplicates, possibly using UMIs.

For this I think you need 1) a (possibly) user-defined grouping criteria ("take R1 and R2 together"), and 2) a (possibly) user-defined reducing/consensus method to compute the pileup number for the grouped reads. Then the interface would like something like this :

 Pileup(FragmentWise(), ReduceWithQuality(minimum_phred=5), "chr1", 1)

That said it might be difficult to make it generic and efficient.

@apraga
Copy link

apraga commented Apr 28, 2023

That would be a great addition. Here's how I count reference and alternate allele it at the moment (there may be better ways, i'm a beginner in Julia):

function getBaseRecord(record, pos)
    i = pos - BAM.position(record)
    BAM.sequence(record)[i+1]
end

function getBase(reader, chrom, pos)
    [getBaseRecord(r, pos) for r in eachoverlap(reader, chrom,pos:pos)]
end

function countSNV(ref, alt)
    ref = length(filter(x -> x == ref, bp))
    alt = length(filter(x -> x == alt, bp))
    total = length(bp)
    (ref, alt, total)
end

Usage:

reader = open(BAM.Reader, bam, index=bai)
bp = getBase(reader,chrom, pos)
close(reader)

(ref, alt, total) = countSNV(convert(DNA, "A"), convert(DNA, "G"))
println("ref=$ref, alt=$alt over $total")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants