Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix join barcode step for Opossum #1397

Open
wants to merge 2 commits into
base: develop
Choose a base branch
from

Conversation

khajoue2
Copy link
Contributor

Fix tabix indexing for large chromosomes

Issue

The previous tabix command was failing due to limitations in TBI (tabix) index
format when dealing with large chromosomes (>512 Mbp). The error message was:

"Region 536870658..536870963 cannot be stored in a tbi index. Try using a csi index"

Fix

Modified the tabix command to use CSI (Coordinate-Sorted Index) instead of TBI.
CSI can handle much larger chromosome sizes.

Changed from:
tabix -s 1 -b 2 -e 3 "${atac_fragment_base}.sorted.tsv.gz"

To:
tabix -s 1 -b 2 -e 3 -C "${atac_fragment_base}.sorted.tsv.gz"

The -C flag tells tabix to create a CSI index. This allows indexing of larger
genomic regions while maintaining the specific column structure of our file.

Impact

This change allows successful indexing of ATAC fragment files that contain
genomic regions larger than 512 Mbp, improving compatibility with a wider
range of reference genomes and ensuring the pipeline can handle larger
chromosomes without failing at the indexing step.

Copy link

Remember to squash merge!

Copy link

🔍Changelog Validation Results:

Comparing changelogs for pipelines that differ from the versions on 'origin/develop':
PairedTag.changelog.md has not been changed and needs to be updated
MultiSampleSmartSeq2SingleNucleus.changelog.md has not been changed and needs to be updated
Optimus.changelog.md has not been changed and needs to be updated
SlideSeq.changelog.md has not been changed and needs to be updated
Some changelog files need updating. See output for details.
validation_failed

Copy link

🔍Version Validation Results:

Comparing versions and changelogs for pipelines that differ from the versions on 'origin/staging':
PairedTag.wdl has not been changed and needs updating
MultiSampleSmartSeq2SingleNucleus.wdl has not been changed and needs updating
Optimus.wdl has not been changed and needs updating
SlideSeq.wdl has not been changed and needs updating
Some WDLs or changelog files need updating. See output for details.
validation_failed

@@ -317,7 +317,7 @@ task JoinMultiomeBarcodes {
echo "Starting bgzip"
bgzip "~{atac_fragment_base}.sorted.tsv"
echo "Starting tabix"
tabix -s 1 -b 2 -e 3 "~{atac_fragment_base}.sorted.tsv.gz"
tabix -s 1 -b 2 -e 3 -C "~{atac_fragment_base}.sorted.tsv.gz"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@khajoue2 Do we know if this is still compatible with ArchR import? We only added tabix in to make the outputs compatible, so if it isn't directly compatible, we might want to flag it to Multiome group.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This information wasn't clearly specified in ArchR's public documentation, we can do some downstream check.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can make a note "If incompatible, explore alternative solutions e.g., chromosome splitting."

@@ -1,3 +1,7 @@
# 5.7.2
2024-10-21 (Date of Last Commit)
* Changed a flag in H5adUtils.wdl to se CSI instead of TBI indexing in tabix command to support chromosomes larger than 512 Mbp.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Changed a flag in H5adUtils.wdl to se CSI instead of TBI indexing in tabix command to support chromosomes larger than 512 Mbp.
* Changed a flag in H5adUtils.wdl to set CSI instead of TBI indexing in tabix command to support chromosomes larger than 512 Mbp.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants