Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing API incomplete functionality for read side detection #251

Merged
merged 5 commits into from
Oct 22, 2024

Conversation

agalitsyna
Copy link
Member

What was wrong?

  • Detection of the read side was not functional with some aligners after API refactoring 9d99660

Technical description of the issue:

  • Detection of the read side was moved from push_pysam to group_alignments_by_side, which relied on sam.is_read1 attribute of pysam entry. It simply does not always work as intended in pysam, and detection of the read side shall be done by sam.flag instead, like was done in
    def push_pysam(sam_entry, sams1, sams2):
    """Parse pysam AlignedSegment (sam) into pairtools sams entry"""
    flag = sam_entry.flag
    if (flag & 0x40) != 0:
    sams1.append(sam_entry) # left read, or first read in a pair
    else:
    sams2.append(sam_entry) # right read, or mate pair
    return
  • Inappropriate detection of the read side probably resulted in sometimes reporting the read side, potentially depending on the aligner. It did not work for single-end reads reported in issue single-end mode is broken in v1.1.0 #247.

Solution:

  • group_alignments_by_side was refactored and fully merged with the old push_pysam function.

I did not design any specific test, as existing tests cover parse and parse2 read side detection; they just cannot cover all possible aligners that can report read side differently.

… side shall be determined through flag and not through sam entry attributes)
@agalitsyna
Copy link
Member Author

Another issue with readID transformation pointed out by #247 (comment) : readID transformation was applied to the readID only upon reading the first alignment.
Now this functionality was moved to read_alignment_block completely.

@agalitsyna
Copy link
Member Author

Fixed --expand not working together with --single-end.
Issue was inherited from earlier versions, highlighter by #247 (comment)

Copy link
Member

@golobor golobor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me, good to merge!

"read" - 5'-end of alignments relative to R1 or R2 read coordinate system (as in traditional Hi-C),
"walk" - 5'-end of alignments relative to the whole walk coordinate system,
"outer" - outer ends of sequential alignments in each pair. """,
"outer" - outer ends of sequential alignments in each pair (parse2 default). """,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, why do we use "outer" as the default? Wouldn't junction make more sense?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, "outer" makes the output interpretation similar to traditional Hi-C (as if there pairs were sequences by paired-end seq)!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll merge it for now (current version keeps the default that we have had already; I've just expanded the docs). Feel free to update anytime if you think it's critical, though!

@agalitsyna agalitsyna merged commit ae544ff into master Oct 22, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants