Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Help to highlight potential differences in BCR maturation between mice of different genotypes #298

Open
aeeckhou opened this issue Aug 1, 2024 · 5 comments

Comments

@aeeckhou
Copy link

aeeckhou commented Aug 1, 2024

Hello,

I am trying to analyze bulk RNAseq of preB cells (Hardy fraction D) data from mice and comparing the results of TRUST4 between different conditions. I have three genotypes for my mice, Wild-Type, Heterozygous for a specific mutation in a gene and homozygous for the same mutation. For each genotypes I have three mice. At the preB cell stage, the light chains genes are not supposed to be rearranged and expressed at the surface. I would like to show if there are differences in differentiation (or not) between the three conditions, potentially at the light chain level, but feel stuck with the TRUST4 output and don't really know how to exploit it.

Would have any advice, scripts or way to look at the data to tackle this question ?
I have processed the output of TRUST4 with the RIMA pipeline and the immunearch package aswell.

Thank you in advance, any advice would be greatly appreciated !
Alexandre

@mourisl
Copy link
Collaborator

mourisl commented Aug 1, 2024

I think one way is to compare the relative ratio of productive IGH and light chain abundances. Essentially, it would be the ratio of abundance sum from IGH CDR3s over abundance sum from light chain CDR3s. If there is no recombination on the light chain, the ratio will be very large.

@aeeckhou
Copy link
Author

aeeckhou commented Aug 2, 2024

Hello,

I have followed your advice by running this code on the AIRR file for each of my mice (I have four replicates by genotypes) :

LRBD72.airr <- read.table("D:/UMR1170/DATA/PreB_BM_Spi1_RNAseq/trust4/data_output/LRBD72_trust4.txt_airr.tsv", sep = "\t", header = TRUE, stringsAsFactors = FALSE)
# Add lib size
LRBD72.airr$sample = "LRBD72"
LRBD72.airr$genotype = "WT_WT"
LRBD72.airr_heavy <- subset(LRBD72.airr, grepl("^IGH",locus))
LRBD72.airr_light <- subset(LRBD72.airr, grepl("^IG[K|L]",locus))
LRBD72.airr_heavy_productive = LRBD72.airr_heavy[which(LRBD72.airr_heavy$productive == "TRUE"),]
LRBD72.airr_light_productive = LRBD72.airr_light[which(LRBD72.airr_light$productive == "TRUE"),]

sum(LRBD72.airr_heavy_productive$consensus_count) # 1393
sum(LRBD72.airr_light_productive$consensus_count) # 11163

This give this table :

<style> </style>
sample genotype productive_IGH productive_light_chain ratio_IGH_on_light_chain ratio_light_chain_on_IGH
LRBD72 WT_WT 1393 11163 0.124787243572516 8.01363962670495
LRBD73 WT_WT 624 5409 0.115363283416528 8.66826923076923
LRBD74 WT_WT 742 5793 0.128085620576558 7.80727762803235
LRBD75 WT_WT 564 7483 0.0753708405719631 13.2677304964539
LRBD76 QE_WT 1848 14045 0.131577073691705 7.60010822510823
LRBD77 QE_WT 982 7256 0.135336273428886 7.38900203665988
LRBD78 QE_WT 747 7107 0.10510764035458 9.5140562248996
LRBD79 QE_WT 961 10204 0.0941787534300274 10.6181061394381
LRBD80 QE_QE 2691 12885 0.208847497089639 4.78818283166109
LRBD81 QE_QE 1663 5865 0.283546462063086 3.52675886951293
LRBD82 QE_QE 942 5909 0.159417837197495 6.27282377919321
LRBD83 QE_QE 1242 11925 0.104150943396226 9.60144927536232

And the associated graphe :

ratio_IGH_on_light_chain

Did I do it correctly ? if so, do you think the results are interpretable ?

I have to say I am suprised to how much productive light chain were found compared to the IGH part. Is it expected ?

Thank you for your answer and guidance !

Best regards,
Alexandre

@aeeckhou
Copy link
Author

aeeckhou commented Aug 2, 2024

In the same logic, I did the productive light chain detected versus the unproductive ones.

LRBD72.airr <- read.table("D:/UMR1170/DATA/PreB_BM_Spi1_RNAseq/trust4/data_output/LRBD72_trust4.txt_airr.tsv", sep = "\t", header = TRUE, stringsAsFactors = FALSE)
# Add lib size
LRBD72.airr$sample = "LRBD72"
LRBD72.airr$genotype = "WT_WT"
LRBD72.airr_light <- subset(LRBD72.airr, grepl("^IG[K|L]",locus))
LRBD72.airr_light <- LRBD72.airr_light %>% mutate(lib.size = sum(consensus_count))

LRBD72.airr_light_productive = LRBD72.airr_light[which(LRBD72.airr_light$productive == "TRUE"),]
LRBD72.airr_light_unproductive = LRBD72.airr_light[which(LRBD72.airr_light$productive == "FALSE"),]

sum(LRBD72.airr_light_productive$consensus_count) # 11163
sum(LRBD72.airr_light_unproductive$consensus_count) # 2632

The table :

<style> </style>
sample genotype productive_count unproductive_count ratio
LRBD72 WT_WT 11163 2632 4.24126139817629
LRBD73 WT_WT 5409 1568 3.44961734693878
LRBD74 WT_WT 5793 1286 4.50466562986003
LRBD75 WT_WT 7483 1607 4.65650280024891
LRBD76 QE_WT 14045 2076 6.76541425818882
LRBD77 QE_WT 7256 1505 4.82126245847176
LRBD78 QE_WT 7107 1518 4.68181818181818
LRBD79 QE_WT 10204 2485 4.10623742454728
LRBD80 QE_QE 12885 1251 10.2997601918465
LRBD81 QE_QE 5865 1075 5.45581395348837
LRBD82 QE_QE 5909 1352 4.37056213017751
LRBD83 QE_QE 11925 3612 3.3014950166113

The graphe :

Ratio_productive_count_on_unproductive_count

@mourisl
Copy link
Collaborator

mourisl commented Aug 3, 2024

Usually the light chain have higher expression (on the RNA level) than the heavy chain, so it is expected to see the ratio is much less than 1. I'm not sure about the ratio for pre-B cell though. The analysis may show that the QE_QE does have fewer light chain expressed than other genotypes.

Another comparison could be that you can get the total number reads from the original RNA-seq data, and then calculate TRUST4's estimated abundance fraction with respect to all the reads. Then QE_QE might have smaller fraction.

@aeeckhou
Copy link
Author

aeeckhou commented Aug 6, 2024

Hello,

I extracted the number of reads like I did previously, with the code below. Is it what you meant by abundance ?

LRBD72.airr <- read.table("D:/UMR1170/DATA/PreB_BM_Spi1_RNAseq/trust4/data_output/LRBD72_trust4.txt_airr.tsv", sep = "\t", header = TRUE, stringsAsFactors = FALSE)

LRBD72.airr$sample = "LRBD72"
LRBD72.airr$genotype = "WT_WT"
LRBD72.airr_light <- subset(LRBD72.airr, grepl("^IG[K|L]",locus))
LRBD72.airr_light <- LRBD72.airr_light %>% mutate(lib.size = sum(consensus_count))

LRBD72.airr_light_productive = LRBD72.airr_light[which(LRBD72.airr_light$productive == "TRUE"),]
LRBD72.airr_light_unproductive = LRBD72.airr_light[which(LRBD72.airr_light$productive == "FALSE"),]

sum(LRBD72.airr_light_productive$consensus_count) # 11163
sum(LRBD72.airr_light_unproductive$consensus_count) # 2632

I also extracted the number of reads mapped with samtools flagstat. The two information combine give me this table below :

<style> </style>
sample genotype total_read_count_mapped productive_count_ligh_chain unproductive_count_light_chain productive_on_total_reads total_light_chain_on_total_reads
LRBD72 WT_WT 88677204 11163 2632 0.012588354 0.015556422
LRBD73 WT_WT 48184727 5409 1568 0.011225549 0.014479692
LRBD74 WT_WT 43599019 5793 1286 0.013286996 0.016236604
LRBD75 WT_WT 56660549 7483 1607 0.01320672 0.016042908
LRBD76 QE_WT 67156319 14045 2076 0.020913892 0.024005187
LRBD77 QE_WT 69329208 7256 1505 0.010466007 0.01263681
LRBD78 QE_WT 58421262 7107 1518 0.012165092 0.014763461
LRBD79 QE_WT 69076223 10204 2485 0.014772087 0.018369563
LRBD80 QE_QE 51804209 12885 1251 0.024872496 0.027287358
LRBD81 QE_QE 76607321 5865 1075 0.007655926 0.009059186
LRBD82 QE_QE 58901563 5909 1352 0.010031992 0.012327347
LRBD83 QE_QE 91992276 11925 3612 0.012963045 0.016889461

The total number of reads supporting the expression of detected productive ligh chain divided by the total number of reads gives this graphe :

Ratio_productive_reads_light_chains_divided_by_total_reads

The total number of reads supporting the expression of ligh chains (productive and unproductive) divided by the total number of reads gives this graphe :

Ratio_total_reads_light_chains_divided_by_total_reads

It seems that the QE_QE has a lower ratio. Nevertheless, it does not seem very convincing... What do you think ?

In the end, this seems to be my best analysis so far (read count of productive heavy chain divided by read of productive light chain), indicating that QE_QE might expressed less light chains.

ratio_IGH_on_light_chain

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants