Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Best way to run TRUST4 on TCR-sequencing data obtained from the Illumina MiSeq platform #285

Open
dcarbajo opened this issue Jul 8, 2024 · 1 comment

Comments

@dcarbajo
Copy link

dcarbajo commented Jul 8, 2024

Hello again!

I want to process the sequencing data obtained by Croce et al in "Phage display profiling of CDR3β loops enables machine learning predictions of NY-ESO-1 specific TCRs" with TRUST4.

They build large phage display libraries of TCRs with randomized CDR3 β chain and pan them against the NY-ESO-1 epitope. They performed 2x250bp paired-end NGS sequencing on the extracted and amplified output DNA using an in-house Illumina MiSeq platform.

The TCR-sequencing data obtained with MiSeq was processed using MiXCR v3.0.13 with standard parameters (mixcr analyze amplicon --species hs starting-material dna --5-end v-primers --3-end j-primers --receptor-type TRB).

Now I want to process this raw data myself using TRUST4 for other follow-up analysis we want to do. Is there any parameter or anything else in particular that I should be aware of when running TRUST4 with this MiSeq data?

In principle, I just intended to run it like this (per sample):

run-trust4 --barcodeLevel cell
           -f path_to/hg38_bcrtcr.fa
           --ref path_to/human_IMGT+C.fa
           -1 path_to/file1.fastq.gz
           -2 path_to/file1.fastq.gz
           --repseq -o sample_name --od sample_output_dir --clean 1 -t 8

Thanks a lot for all your help!

@mourisl
Copy link
Collaborator

mourisl commented Jul 8, 2024

Yes, it seems this is a non-UMI-based TCR-seq data, so using --repseq is good. There is no need to specify the barcodeLevel as barcode information is not provided.

The starting material is DNA, so there might be many un-recombined genomic sequences in the output, which may require extra step for filtering, like mapping to the reference genome.

Hope this helps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants