Help to highlight potential differences in BCR maturation between mice of different genotypes #298

aeeckhou · 2024-08-01T14:29:18Z

Hello,

I am trying to analyze bulk RNAseq of preB cells (Hardy fraction D) data from mice and comparing the results of TRUST4 between different conditions. I have three genotypes for my mice, Wild-Type, Heterozygous for a specific mutation in a gene and homozygous for the same mutation. For each genotypes I have three mice. At the preB cell stage, the light chains genes are not supposed to be rearranged and expressed at the surface. I would like to show if there are differences in differentiation (or not) between the three conditions, potentially at the light chain level, but feel stuck with the TRUST4 output and don't really know how to exploit it.

Would have any advice, scripts or way to look at the data to tackle this question ?
I have processed the output of TRUST4 with the RIMA pipeline and the immunearch package aswell.

Thank you in advance, any advice would be greatly appreciated !
Alexandre

mourisl · 2024-08-01T14:39:54Z

I think one way is to compare the relative ratio of productive IGH and light chain abundances. Essentially, it would be the ratio of abundance sum from IGH CDR3s over abundance sum from light chain CDR3s. If there is no recombination on the light chain, the ratio will be very large.

aeeckhou · 2024-08-02T09:14:53Z

Hello,

I have followed your advice by running this code on the AIRR file for each of my mice (I have four replicates by genotypes) :

LRBD72.airr <- read.table("D:/UMR1170/DATA/PreB_BM_Spi1_RNAseq/trust4/data_output/LRBD72_trust4.txt_airr.tsv", sep = "\t", header = TRUE, stringsAsFactors = FALSE)
# Add lib size
LRBD72.airr$sample = "LRBD72"
LRBD72.airr$genotype = "WT_WT"
LRBD72.airr_heavy <- subset(LRBD72.airr, grepl("^IGH",locus))
LRBD72.airr_light <- subset(LRBD72.airr, grepl("^IG[K|L]",locus))
LRBD72.airr_heavy_productive = LRBD72.airr_heavy[which(LRBD72.airr_heavy$productive == "TRUE"),]
LRBD72.airr_light_productive = LRBD72.airr_light[which(LRBD72.airr_light$productive == "TRUE"),]

sum(LRBD72.airr_heavy_productive$consensus_count) # 1393
sum(LRBD72.airr_light_productive$consensus_count) # 11163

This give this table :

sample	genotype	productive_IGH	productive_light_chain	ratio_IGH_on_light_chain	ratio_light_chain_on_IGH
LRBD72	WT_WT	1393	11163	0.124787243572516	8.01363962670495
LRBD73	WT_WT	624	5409	0.115363283416528	8.66826923076923
LRBD74	WT_WT	742	5793	0.128085620576558	7.80727762803235
LRBD75	WT_WT	564	7483	0.0753708405719631	13.2677304964539
LRBD76	QE_WT	1848	14045	0.131577073691705	7.60010822510823
LRBD77	QE_WT	982	7256	0.135336273428886	7.38900203665988
LRBD78	QE_WT	747	7107	0.10510764035458	9.5140562248996
LRBD79	QE_WT	961	10204	0.0941787534300274	10.6181061394381
LRBD80	QE_QE	2691	12885	0.208847497089639	4.78818283166109
LRBD81	QE_QE	1663	5865	0.283546462063086	3.52675886951293
LRBD82	QE_QE	942	5909	0.159417837197495	6.27282377919321
LRBD83	QE_QE	1242	11925	0.104150943396226	9.60144927536232

And the associated graphe :

Did I do it correctly ? if so, do you think the results are interpretable ?

I have to say I am suprised to how much productive light chain were found compared to the IGH part. Is it expected ?

Thank you for your answer and guidance !

Best regards,
Alexandre

aeeckhou · 2024-08-02T09:23:30Z

In the same logic, I did the productive light chain detected versus the unproductive ones.

LRBD72.airr <- read.table("D:/UMR1170/DATA/PreB_BM_Spi1_RNAseq/trust4/data_output/LRBD72_trust4.txt_airr.tsv", sep = "\t", header = TRUE, stringsAsFactors = FALSE)
# Add lib size
LRBD72.airr$sample = "LRBD72"
LRBD72.airr$genotype = "WT_WT"
LRBD72.airr_light <- subset(LRBD72.airr, grepl("^IG[K|L]",locus))
LRBD72.airr_light <- LRBD72.airr_light %>% mutate(lib.size = sum(consensus_count))

LRBD72.airr_light_productive = LRBD72.airr_light[which(LRBD72.airr_light$productive == "TRUE"),]
LRBD72.airr_light_unproductive = LRBD72.airr_light[which(LRBD72.airr_light$productive == "FALSE"),]

sum(LRBD72.airr_light_productive$consensus_count) # 11163
sum(LRBD72.airr_light_unproductive$consensus_count) # 2632

The table :

sample	genotype	productive_count	unproductive_count	ratio
LRBD72	WT_WT	11163	2632	4.24126139817629
LRBD73	WT_WT	5409	1568	3.44961734693878
LRBD74	WT_WT	5793	1286	4.50466562986003
LRBD75	WT_WT	7483	1607	4.65650280024891
LRBD76	QE_WT	14045	2076	6.76541425818882
LRBD77	QE_WT	7256	1505	4.82126245847176
LRBD78	QE_WT	7107	1518	4.68181818181818
LRBD79	QE_WT	10204	2485	4.10623742454728
LRBD80	QE_QE	12885	1251	10.2997601918465
LRBD81	QE_QE	5865	1075	5.45581395348837
LRBD82	QE_QE	5909	1352	4.37056213017751
LRBD83	QE_QE	11925	3612	3.3014950166113

The graphe :

mourisl · 2024-08-03T03:03:50Z

Usually the light chain have higher expression (on the RNA level) than the heavy chain, so it is expected to see the ratio is much less than 1. I'm not sure about the ratio for pre-B cell though. The analysis may show that the QE_QE does have fewer light chain expressed than other genotypes.

Another comparison could be that you can get the total number reads from the original RNA-seq data, and then calculate TRUST4's estimated abundance fraction with respect to all the reads. Then QE_QE might have smaller fraction.

aeeckhou · 2024-08-06T11:10:37Z

Hello,

I extracted the number of reads like I did previously, with the code below. Is it what you meant by abundance ?

LRBD72.airr <- read.table("D:/UMR1170/DATA/PreB_BM_Spi1_RNAseq/trust4/data_output/LRBD72_trust4.txt_airr.tsv", sep = "\t", header = TRUE, stringsAsFactors = FALSE)

LRBD72.airr$sample = "LRBD72"
LRBD72.airr$genotype = "WT_WT"
LRBD72.airr_light <- subset(LRBD72.airr, grepl("^IG[K|L]",locus))
LRBD72.airr_light <- LRBD72.airr_light %>% mutate(lib.size = sum(consensus_count))

LRBD72.airr_light_productive = LRBD72.airr_light[which(LRBD72.airr_light$productive == "TRUE"),]
LRBD72.airr_light_unproductive = LRBD72.airr_light[which(LRBD72.airr_light$productive == "FALSE"),]

sum(LRBD72.airr_light_productive$consensus_count) # 11163
sum(LRBD72.airr_light_unproductive$consensus_count) # 2632

I also extracted the number of reads mapped with samtools flagstat. The two information combine give me this table below :

sample	genotype	total_read_count_mapped	productive_count_ligh_chain	unproductive_count_light_chain	productive_on_total_reads	total_light_chain_on_total_reads
LRBD72	WT_WT	88677204	11163	2632	0.012588354	0.015556422
LRBD73	WT_WT	48184727	5409	1568	0.011225549	0.014479692
LRBD74	WT_WT	43599019	5793	1286	0.013286996	0.016236604
LRBD75	WT_WT	56660549	7483	1607	0.01320672	0.016042908
LRBD76	QE_WT	67156319	14045	2076	0.020913892	0.024005187
LRBD77	QE_WT	69329208	7256	1505	0.010466007	0.01263681
LRBD78	QE_WT	58421262	7107	1518	0.012165092	0.014763461
LRBD79	QE_WT	69076223	10204	2485	0.014772087	0.018369563
LRBD80	QE_QE	51804209	12885	1251	0.024872496	0.027287358
LRBD81	QE_QE	76607321	5865	1075	0.007655926	0.009059186
LRBD82	QE_QE	58901563	5909	1352	0.010031992	0.012327347
LRBD83	QE_QE	91992276	11925	3612	0.012963045	0.016889461

The total number of reads supporting the expression of detected productive ligh chain divided by the total number of reads gives this graphe :

The total number of reads supporting the expression of ligh chains (productive and unproductive) divided by the total number of reads gives this graphe :

It seems that the QE_QE has a lower ratio. Nevertheless, it does not seem very convincing... What do you think ?

In the end, this seems to be my best analysis so far (read count of productive heavy chain divided by read of productive light chain), indicating that QE_QE might expressed less light chains.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Help to highlight potential differences in BCR maturation between mice of different genotypes #298

Help to highlight potential differences in BCR maturation between mice of different genotypes #298

aeeckhou commented Aug 1, 2024

mourisl commented Aug 1, 2024

aeeckhou commented Aug 2, 2024

aeeckhou commented Aug 2, 2024

mourisl commented Aug 3, 2024

aeeckhou commented Aug 6, 2024

Help to highlight potential differences in BCR maturation between mice of different genotypes #298

Help to highlight potential differences in BCR maturation between mice of different genotypes #298

Comments

aeeckhou commented Aug 1, 2024

mourisl commented Aug 1, 2024

aeeckhou commented Aug 2, 2024

aeeckhou commented Aug 2, 2024

mourisl commented Aug 3, 2024

aeeckhou commented Aug 6, 2024