Question Regarding UMI and Barcode Parameters for UMI-based Bulk BCR-Seq Data Assembly #336

Yaxiong293 · 2024-12-19T07:26:59Z

Dear Developer,

I am working with UMI-based bulk BCR sequencing data, and I would like to merge reads that have the same UMI. Should I use the -barcode parameter or the -umi parameter for this purpose?

Thank you for your assistance!

Yaxiong293 · 2024-12-19T11:50:40Z

I would like to ask why the results I obtained using the -umi parameter and the -barcode parameter are very different. The clone counts and types using the -barcode parameter are about half of those obtained with the -umi parameter. Aren't both parameters supposed to assemble reads with the same UMI?

Additionally, I would like to know how to output the unassembled reads. I appreciate your assistance on this matter. Thank you!

mourisl · 2024-12-19T16:48:22Z

The --UMI is for 10x Genomics platform, and TRUST4 only use that information for abundance estimation. For UMI-based BCR-seq, you shall regard the UMI as the barcode (use the --barcode option), and then add another option "--barcodeLevel molecule" to specify that this barcode is actually UMI. Hope this helps.

Yaxiong293 · 2024-12-20T02:03:30Z

Dear Mourisl,

Thank you for your assistance!

I have a few more questions regarding the parameters I'm using. My UMI is located in the first 34 bases of R1, and here are the parameters I am currently using:

--barcodeLevel molecule
--barcode $READ1_FILE
--readFormat bc:0:33 \

With these parameters, can I achieve assembly based on the same UMI as well as UMI deduplication?

Additionally, I inputted 20 million reads, but only 5 million were assembled. Is it possible for TRUST4 to output these unassembled reads?

mourisl · 2024-12-20T02:38:07Z

The parameter looks right. Does read1 also contain BCR data? For example, if after the first 34bp is the BCR sequence information, you may need to use --readFormat bc:0:33,r1:34:-1 so the sequence used in the assembly can skip the UMI data.

In the result, the assembly will be based on the same UMI (as in *_barcode_airr/report.tsv file). The abundance summarized based on UMI count (deduplication) should be in the airr/report files without the term "barcode" in it.

TRUST4 output the candidate reads and assembled reads. You can use the input read minus the assembled read to get the unassembled reads.

Yaxiong293 · 2024-12-20T02:50:49Z

Thank you for your response!

Yes, my Read1 does contain BCR data, and since I'm using paired-end sequencing, should my --readFormat be set as bc:0:33,r1:34:-1,r2:0:-1?

I appreciate your assistance!

mourisl · 2024-12-20T05:05:20Z

"r2:0:-1" can be skipped as TRUST4 will use the full sequence if the region is not specified.

Yaxiong293 · 2024-12-20T09:33:53Z

Thanks you! it helps a lot

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question Regarding UMI and Barcode Parameters for UMI-based Bulk BCR-Seq Data Assembly #336

Question Regarding UMI and Barcode Parameters for UMI-based Bulk BCR-Seq Data Assembly #336

Yaxiong293 commented Dec 19, 2024

Yaxiong293 commented Dec 19, 2024

mourisl commented Dec 19, 2024

Yaxiong293 commented Dec 20, 2024

mourisl commented Dec 20, 2024

Yaxiong293 commented Dec 20, 2024

mourisl commented Dec 20, 2024

Yaxiong293 commented Dec 20, 2024

Question Regarding UMI and Barcode Parameters for UMI-based Bulk BCR-Seq Data Assembly #336

Question Regarding UMI and Barcode Parameters for UMI-based Bulk BCR-Seq Data Assembly #336

Comments

Yaxiong293 commented Dec 19, 2024

Yaxiong293 commented Dec 19, 2024

mourisl commented Dec 19, 2024

Yaxiong293 commented Dec 20, 2024

mourisl commented Dec 20, 2024

Yaxiong293 commented Dec 20, 2024

mourisl commented Dec 20, 2024

Yaxiong293 commented Dec 20, 2024