Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running TRUST4 with non-10X single-cell data, barcodes are in the RG:Z headers of fastq #277

Open
LukaP-BB opened this issue May 31, 2024 · 6 comments

Comments

@LukaP-BB
Copy link

LukaP-BB commented May 31, 2024

I have fastq files for scDNA with barcodes extracted in the headers, in the RG:Z field.

@A01789:135:HLKCJDMXY:1:1101:1027:1047 RG:Z:CGTGCCTATTCGGACAGT
TTAAATTGGTATCAGAAGAAACCAGGGAAAGCCCCTAAGCTCCTGATCTACGATGCATCCAATCCGGAAACAGGGGTCCCATCAAGGTTCAGTGGAA
+
FFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFF,FFFFFFF:FFFFFFFFFF:,FFFFF:FFFFFFF,F

Is there a way to use this information in a similar fashion as specifying the field when the input is a bam file ? I couldn't find it.

If there is no way to do it currently, what would be your recommended way to specify barcodes ?

I tried extracting the raw barcodes from the headers in a text file, but it seems it isn't the right solution

@mourisl
Copy link
Collaborator

mourisl commented May 31, 2024

Currently, we don't support parsing the barcode in the header. You can extract the raw barcode into another fasta file, like

>A01789:135:HLKCJDMXY:1:1101:1027:1047
CGTGCCTATTCGGACAGT

I will add the feature to parse from the header in the next or next next release.

@LukaP-BB
Copy link
Author

LukaP-BB commented Jun 5, 2024

Thanks for your swift reply, the solution seemed to work as TRUST4 is now running.

This is a tangent to the original issue, but do you have a recommendation for the number of threads to use ? I launched a test run on 1 thread but it is taking >24 hours to complete on my data. Is the relationship between n_threads and speed linear ?

@mourisl
Copy link
Collaborator

mourisl commented Jun 5, 2024

I usually use 8 threads. I think the gain probably plateaus after 16 threads. Which step do you find TRUST4 stuck on? Which version of TRUST4 are you using?

@LukaP-BB
Copy link
Author

Hi, I'm running trust4 V1.0.5.1 according to conda. I tried again with 20 threads just to be sure to overshoot, and it got quite slow at the same step, where it displays in the logs [Sat Jun 8 08:55:39 2024] Processed 32600000 reads (30149746 are used for assembly) then got timeout after 2 days.

My data is probably not appropriate as it is, since R1 and R2 fastq.gz are ~27G each, and most of the data within will not be IGH reads. If I align beforehand and provide bam files to TRUST4, I guess it will be able to focus on the IG regions more efficiently ? I originally wanted to avoid doing the alignment myself since most of the workflow is outsourced.

@mourisl
Copy link
Collaborator

mourisl commented Jun 10, 2024

Is it possible to upgrade to the recent version of v1.1.1? The speed on barcode-based data has been improved much since v1.1.0.

@LukaP-BB
Copy link
Author

I'll try and get to you after I tested it, I assumed naïvely that conda installed the latest version. Thanks for your help ! ❤️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants