Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Finding a map between observed bases and positions in the reference genome #46

Open
scikal opened this issue Apr 11, 2022 · 4 comments
Open
Labels
question Further information is requested

Comments

@scikal
Copy link

scikal commented Apr 11, 2022

Hi,

Thank you for building and maintaining this great package --- I wouldn't start analyzing genomic data via Julia without it!

I have a steps that I used to perform using pysam, but could not figure how they can be done via XAM:

  1. For each record/read, get a list of start and end positions of aligned gapless blocks.
  2. For each record/read, get a list of reference positions that this read aligns to. The length of the list should be the same length as the read. This should allow mapping each observed base to a position in the reference genome, in the presence of insertions and deletions.

Would you be able to guide me how this can be done?

Sincerely,
Daniel

@scikal scikal changed the title Add Finding a map between observed bases and positions in the reference genome Apr 11, 2022
@kescobo
Copy link
Member

kescobo commented Apr 11, 2022

As a first pass, I'd look here and set up the for record in reader... bit. As for what happens inside the loop, the easiest (though not necessarily most performant) thing would probably be to just make a couple of vectors and push! to them. Eg

startstops = []
refpos = []
for record  in reader
    # whatever to get start / end positions...
    push!(startstops, (starts, stops))
    push!(refpos, (position(record), rightposition(record)))
end

@kescobo kescobo added the question Further information is requested label Apr 11, 2022
@scikal
Copy link
Author

scikal commented Apr 11, 2022

Thank you for the prompt response.

I'm trying to deal with the case of a read that contains short deletions, so the read length is smaller than the rightposition(record)-position(record).

@scikal
Copy link
Author

scikal commented Apr 11, 2022

I was wondering if there is a more easy solution than parsing the cigar.

@kescobo
Copy link
Member

kescobo commented Apr 11, 2022

Ah, that I don't know. Will have to wait for someone more familiar with the format to weigh in, sorry.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants