-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix join barcode step for Opossum #1397
base: develop
Are you sure you want to change the base?
Conversation
…mosomes larger than 512 Mbp.
Remember to squash merge! |
🔍Changelog Validation Results:
|
🔍Version Validation Results:
|
@@ -317,7 +317,7 @@ task JoinMultiomeBarcodes { | |||
echo "Starting bgzip" | |||
bgzip "~{atac_fragment_base}.sorted.tsv" | |||
echo "Starting tabix" | |||
tabix -s 1 -b 2 -e 3 "~{atac_fragment_base}.sorted.tsv.gz" | |||
tabix -s 1 -b 2 -e 3 -C "~{atac_fragment_base}.sorted.tsv.gz" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@khajoue2 Do we know if this is still compatible with ArchR import? We only added tabix in to make the outputs compatible, so if it isn't directly compatible, we might want to flag it to Multiome group.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This information wasn't clearly specified in ArchR's public documentation, we can do some downstream check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can make a note "If incompatible, explore alternative solutions e.g., chromosome splitting."
@@ -1,3 +1,7 @@ | |||
# 5.7.2 | |||
2024-10-21 (Date of Last Commit) | |||
* Changed a flag in H5adUtils.wdl to se CSI instead of TBI indexing in tabix command to support chromosomes larger than 512 Mbp. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Changed a flag in H5adUtils.wdl to se CSI instead of TBI indexing in tabix command to support chromosomes larger than 512 Mbp. | |
* Changed a flag in H5adUtils.wdl to set CSI instead of TBI indexing in tabix command to support chromosomes larger than 512 Mbp. |
Fix tabix indexing for large chromosomes
Issue
The previous tabix command was failing due to limitations in TBI (tabix) index
format when dealing with large chromosomes (>512 Mbp). The error message was:
"Region 536870658..536870963 cannot be stored in a tbi index. Try using a csi index"
Fix
Modified the tabix command to use CSI (Coordinate-Sorted Index) instead of TBI.
CSI can handle much larger chromosome sizes.
Changed from:
tabix -s 1 -b 2 -e 3 "${atac_fragment_base}.sorted.tsv.gz"
To:
tabix -s 1 -b 2 -e 3 -C "${atac_fragment_base}.sorted.tsv.gz"
The -C flag tells tabix to create a CSI index. This allows indexing of larger
genomic regions while maintaining the specific column structure of our file.
Impact
This change allows successful indexing of ATAC fragment files that contain
genomic regions larger than 512 Mbp, improving compatibility with a wider
range of reference genomes and ensuring the pipeline can handle larger
chromosomes without failing at the indexing step.