-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
I would like to know how Trust4 can directly analyze paired-end .fastq format data from the 10X Genomics platform for single-cell analysis.? #271
Comments
Here is the reply from the discussion just in case you missed it: Yes, you can. It would be something like running fastq files from this section: https://github.com/liulab-dfci/TRUST4?tab=readme-ov-file#10x-genomics-data-and-barcode-based-single-cell-data . For the running speed, which version of TRUST4 are you using? Which step do you find is too slow? |
I am currently using the Cell Ranger to analyze upstream FASTQ data to obtain BAM format data for 10X single-cell transcriptome analysis of the immune repertoire. Then, I use the command run-trust4 -t 25 -b /home/zxsys/data6/bam/SRR22007527_genome_bam.bam -f /home/zxsys/data6/hg38_bcrtcr.fa --ref /home/zxsys/data6/human_IMGT+C.fa --barcode CB to analyze the BAM data to obtain single-cell immune repertoire data. This workflow is too slow, preventing rapid completion of data analysis. I would now like to know how to use the Trust4 command to directly analyze single-cell transcriptome FASTQ data to obtain immune repertoire data, without first using Cell Ranger to analyze and obtain BAM. Currently, when I use the command run-trust4 -f hg38_bcrtcr.fa --ref human_IMGT+C.fa -u path_to_10X_fastqs/R2.fastq.gz --barcode path_to_10X_fastqs/R1.fastq.gz --readFormat bc:0:15 --barcodeWhitelist cellranger_folder/cellranger-cs/VERSION/lib/python/cellranger/barcodes/737K-august-2016.txt [other options] to analyze single-cell transcriptome data, it results in errors and the analysis cannot be completed. |
First of all, thank you for your reply. |
What error message did you get? Is your data 10X gene expression data or 10X vdj-kit data? Which version of TRUST4 are you using? Your command looks right to me. (Let's use this issue instead of the Discussion). |
Hello expert, I am currently using the following command which only supports single-end data. Could you provide a command for analyzing paired-end data? Since I am a beginner, there are many things I still need to learn. run-trust4 -f hg38_bcrtcr.fa --ref human_IMGT+C.fa -u path_to_10X_fastqs/R2.fastq.gz --barcode path_to_10X_fastqs/R1.fastq.gz --readFormat bc:0:15 --barcodeWhitelist cellranger_folder/cellranger-cs/VERSION/lib/python/cellranger/barcodes/737K-august-2016.txt [other options] |
This depends on your structure. For example, if the read is in both R1, R2, and barcode and UMI is also in R1's first 26bp (16bp barcode + 10bp UMI), You can use "-1 R1 -2 R2 --barcode R1 --readFromat bc:0:15,r1:26:-1" for this. |
Hello, I tested TRUST4 on raw FASTQ files derived from 10x, paired-end, 16bp barcode + 10bp UMI in R1's first 26bp, and ~40M reads, I used this code:
it ran well, but slowly, taking close to 4 hours: [Wed Oct 30 13:49:50 2024] TRUST4 v1.1.4-r534 begins. In comparison, cellranger vdj ran for half an hour, so, I would like to know is this normal or something wrong with me? |
It's normal. I think your data is from 10x VDJ kit. TRUST4 is designed for regular RNA-seq data, like regular 10x 5' scRNA-seq without VDJ amplification. The read coverage is sparse for untarget-amplified data, so the assembly procedure, including data preprocessing, is more complex to achieve good sensitivity. I'm working on improving the running efficiency for the targeted-amplified data, like TCR-seq/BCR-seq, in recent versions, but there is still a long way to go in the current code structure. |
Yes, the data was from 10x TCR/BCR Amplification Kit! And, thank you for your reply, looking forward to the new version! |
Hi, I'm trying to understand the output files generated by the command above, but have some questions, 1、_toassemble_1.fq,_toassemble_2.fq,*_toassemble_bc.fa files have the same number of reads, in my case is 20,173,744,can I regard it as the number of reads after barcode correction? but why more reads are found in the log file: [Mon Nov 18 12:03:24 2024] Read in and count kmers for 38800000 reads. 2、How can I find the number like 'Valid Barcodes' in 10x qc report, which means 'Fraction of reads with barcodes that match the whitelist after barcode correction.'? 3、Is there a detailed document about the output files? |
|
Hi, do you have any advice on this situation? Please let me know if you need more information. |
Since "result_tmp' is a pretty general name, I'm wondering whether you are running multiple instances of TRUST4 at the same time, and the intermediate files are overwritten by another sample? |
Thanks for your advice, I created a new directory, modified the '-o' parameter, executed the command only once and got the consistent result as before. the command: all files, and the number of reads in the toassemble_bc.fa: the nohup.out file: Is there any further testing I can do? |
Can you show me the first a few lines of the two input fastq files? |
R1_001.fastq.gz@LH00169:488:22HLWYLT4:7:1101:2141:1042 1:N:0:AAGATTGGAT+AAATCCCGCT R2_001.fastq.gz@LH00169:488:22HLWYLT4:7:1101:2141:1042 2:N:0:AAGATTGGAT+AAATCCCGCT |
Oh, the read count in the log will count the paired-end read twice, one for read1, and one for read2. So 20M read pairs would mean around 40M reads during the counting kmer. So the numbers are consistent. I should have given this explanation much earlier... |
That makes sense! Now we know,
So could we assume that,
|
|
Thanks very much for your reply! I will run more tests. |
I would like to ask how Trust4 can directly analyze paired-end .fastq format data from the 10X Genomics platform for single-cell analysis, instead of analyzing BAM format data. Can you provide support for this analysis? The current analysis speed is too slow.
run-trust4 -t 25 -b /home/zxsys/data6/bam/SRR22007527_genome_bam.bam -f /home/zxsys/data6/hg38_bcrtcr.fa --ref /home/zxsys/data6/human_IMGT+C.fa --barcode CB
Is it possible to directly use FASTQ format for paired-end single-cell data analysis without using BAM files, while still ensuring that Trust4 operates normally?
The text was updated successfully, but these errors were encountered: