MultiModalMuSig

A Julia package implementing several topic models used for mutation signature estimation.

The Multi-modal correlated topic model (MMCTM) takes an array of arrays of matrices, where the first column of each matrix is a mutation type index, and the second column is the mutation count for a particular sample.

The following example shows how to perform inference using the MMCTM on SNV and SV counts:

using MultiModalMuSig
using CSV
using DataFrames
using VegaLite
using Random

Random.seed!(42)

snv_counts = CSV.read("data/brca-eu_snv_counts.tsv", delim='\t')
sv_counts = CSV.read("data/brca-eu_sv_counts.tsv", delim='\t')

X = format_counts_mmctm(snv_counts, sv_counts)
model = MMCTM([7, 7], [0.1, 0.1], X)
fit!(model, tol=1e-5)

snv_signatures = DataFrame(hcat(model.ϕ[1]...))
sv_signatures = DataFrame(hcat(model.ϕ[2]...))

snv_signatures[:term] = snv_counts[:term]
snv_signatures = melt(
    snv_signatures, :term, variable_name=:signature, value_name=:probability
)
snv_signatures |> @vlplot(
    :bar, x={:term, sort=:null}, y=:probability, row=:signature,
    resolve={scale={y=:independent}}
)

This code runs the MMCTM for 7 SNV and 7 SV signatures, with signature hyperparameters set to 0.1. Since these types of models can get stuck in poor local optima, it's a good idea to fit many models and pick the best one.

Sample-signature probabilities can be extracted like so:

# sample 3, SNV signature probabilities (modality 1)
model.props[3][1]
# SV signature probabilities (modality 2)
model.props[3][2]

# SNV probabilities for all samples
snv_props = hcat(
	[model.props[i][1] for i in 1:length(model.props)]...
)

The MMCTM can be run on multiple modalities, e.g.

X = format_counts_mmctm(snv_counts, sv_counts, indel_counts)
model = MMCTM([7, 7, 5], [0.1, 0.1, 0.1], X)

The DataFrame inputs to format_counts_mmctm have an optional term column, and further columns for each sample.

To run the CTM instead, just run the MMCTM with a single modality:

X = format_counts_ctm(snv_counts)
model = MMCTM([7], [0.1], X)
fit!(model, tol=1e-5)

The LDA implementation can be run like so:

X = format_counts_lda(snv_counts)
model = LDA(7, 0.1, 0.1, X)
fit!(model, tol=1e-5)

In the above code, both the sample-signature and signature-term hyperparameters have been set to 0.1, respectively. After fitting LDA, signatures can be found in model.β, and signature probabilities can be found in model.θ.

Name		Name	Last commit message	Last commit date
Latest commit History 93 Commits
.github/workflows		.github/workflows
data		data
src		src
test		test
.codecov.yml		.codecov.yml
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE.md		LICENSE.md
Project.toml		Project.toml
README.md		README.md
appveyor.yml		appveyor.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MultiModalMuSig

About

Releases

Packages

Languages

License

funnell/MultiModalMuSig.jl

Folders and files

Latest commit

History

Repository files navigation

MultiModalMuSig

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages