divBCR

divBCR.R and divBCR_isotype.R scripts calculate different biodiversity metrics, as well as metrics that characterize the different aspects of the BCR repertoire. This script depends on the R package “plyr”.

Usage: Rscript divBCR.R inputFile.txt outputFile.txt

The input file is a processed aggregated file that will need to be produced from the V'DJer, so that the columns are in the following order: sample, sequence, cdr3, expected_counts, seq_id, isotype, vregion_identity, aa_cdr3, vgene, jgene, total_count, cluster

Overview:

Step 1) B cell recptor counts are normalized to total RNA-seq counts

Step 2) Calculations of biodiversity and other metrics at the sequence leve

Step 3) Sums clusters

Step 4) Calculations of biodiversity at the cluster level (in case of poor sequence quality)

Step 5) Executing the functions, merging the two data frames and saving the output to a file.

Below are the calculations for each sample (not in column order in final output, column header listed, if by* then calculated both at the sequence and cluster level):

Biodiversity metrics:

Shannon entropy (shannon_entropy_by*): -Σ (count proportion * log(count proportion))
Evenness (evenness_by*): Shannon entropy / log(species richness)
True Diversity (true_div_by*): Exp(Shannon entropy)
Gini-Simpson (gini_simpson_by*): 1-(sum(count proportion)^2)

BCR repertoire characteristics:

Top clone proportion (top_clone_prop_by*): max(“species” count) / total counts
Second top clone proportion (second_top_clone_by*): second max (“species” count) / total counts
Number of unique sequences or clusters (num_unique*): species richness, number of unique sequences or number of unique clusters
Total counts (total_counts): sum(count of each sequence)
Average CDR3 length (mean_cdr3_aa_len): mean(CDR3 amino acid length)
Weighted average CDR3 length (w_mean_cdr3_aa_len): sum(CDR3 length* count of each sequence) / sum(count of each sequence)
Mean V-region identity (mean_v_region_ident): mean(V-region identity)
Weighted mean V-region identity (w_mean_v_region_ident): sum(V-region identity* count of each sequence) / sum(count of each sequence)

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
LICENSE		LICENSE
README.md		README.md
README_diversityBCR_function.docx		README_diversityBCR_function.docx
divBCR.R		divBCR.R
divBCR_by_isotype.R		divBCR_by_isotype.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

divBCR

About

Releases

Packages

Languages

License

sararselitsky/divBCR

Folders and files

Latest commit

History

Repository files navigation

divBCR

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages