Skip to content

Censored Trait Analysis

Tianjing Zhao edited this page Aug 12, 2022 · 20 revisions

Censored Trait Analysis

Censored traits should be identified in the argument censored_trait in build_model().

Lower bound should be named as "traitname_l", and upper bound should be named as "traitname_u".

If you have observations without upper or lower bounds, you can have them in your phenotype file as

lower_bound, upper_bound
120,Inf
-Inf,130

If you have observations without upper and lower bounds, you can have them in your phenotype file as

lower_bound, upper_bound
-Inf,Inf

or

lower_bound, upper_bound
missing,missing

The MCMC samples of unobserved continuous liability are in the output folder.

Single-trait Linear Mixed Model (Genomic data)

Data Simulation

Below we will convert a continuous trait "y1" to a censored trait by setting a phenotypic value (e.g.,y11) to a range [lower_bound, upper_bound] as follows,

  1. [y11 - 0.5, y11 + 0.5] if 0.0 < y11 <1.0
  2. [2.0, Inf] if y11 > 2.0
  3. [-Inf, -3.0] if y11 < -3.0
  4. [y11, y11] otherwise

This is only used to demonstrate the censored trait analysis using JWAS.

using JWAS,DataFrames,CSV,Statistics,JWAS.Datasets,Random
Random.seed!(1)

phenofile  = dataset("phenotypes.csv")
phenotypes = CSV.read(phenofile,DataFrame,delim = ',',header=true,missingstrings=["NA"])

#y1 - censored trait
phenotypes[!, :lower_bound1] .= round.(phenotypes[!, :y1],digits=2);
phenotypes[!, :upper_bound1] .= round.(phenotypes[!, :y1],digits=2);
phenotypes[0.0 .< phenotypes[!, :y1] .< 1.0, :upper_bound1] .= phenotypes[0.0 .< phenotypes[!, :y1] .< 1.0, :y1] .+ 0.5;
phenotypes[0.0 .< phenotypes[!, :y1] .< 1.0, :lower_bound1] .= phenotypes[0.0 .< phenotypes[!, :y1] .< 1.0, :y1] .- 0.5;
phenotypes[phenotypes[!, :y1] .> 2.0,:upper_bound1] .= Inf;
phenotypes[phenotypes[!, :y1] .> 2.0,:lower_bound1] .= 2.0;
phenotypes[phenotypes[!, :y1] .< -3.0,:upper_bound1].= -3.0;
phenotypes[phenotypes[!, :y1] .< -3.0,:lower_bound1].= -Inf;
phenotypes=phenotypes[:, Not(:y1)]  #drop :y1 column

JWAS analysis

# Step 0: Load packages
using JWAS,DataFrames,CSV,Statistics,JWAS.Datasets,Random
Random.seed!(1)

# Step 1: Rename the lower bound as "traitname_l", and upper bound as "traitname_u"
rename!(phenotypes, :lower_bound1 => :y1_l, :upper_bound1 => :y1_u)

# Step 2: Read data
pedfile    = dataset("pedigree.csv")
pedigree   = get_pedigree(pedfile,separator=",",header=true);
genofile   = dataset("genotypes.csv")
genotypes  = get_genotypes(genofile,separator=',',method="BayesC");

# Step 3: Build Model Equations
model_equation  ="y1 = intercept + x1 + x2 + x2*x3 + ID + dam + genotypes"
model = build_model(model_equation,
                    censored_trait=["y1"])

# Step 4: Set Factors or Covariates
set_covariate(model,"x1");

# Step 5: Set Random or Fixed Effects
set_random(model,"x2");
set_random(model,"ID dam",pedigree);

# Step 6: Run Analysis
out=runMCMC(model,phenotypes,chain_length=5000);

# Step 7: Check Accuruacy
results    = innerjoin(out["EBV_y1"], phenotypes, on = :ID)
accuruacy  = cor(results[!,:EBV],results[!,:bv1])

Multiple censored traits Linear Mixed Model (Genomic data)

Data Simulation

using JWAS,DataFrames,CSV,Statistics,JWAS.Datasets,Random
Random.seed!(1)

phenofile  = dataset("phenotypes.csv")
phenotypes = CSV.read(phenofile,DataFrame,delim = ',',header=true,missingstrings=["NA"])

#y1 - censored trait
phenotypes[!, :lower_bound1] .= round.(phenotypes[!, :y1],digits=2);
phenotypes[!, :upper_bound1] .= round.(phenotypes[!, :y1],digits=2);
phenotypes[0.0 .< phenotypes[!, :y1] .< 1.0, :upper_bound1] .= phenotypes[0.0 .< phenotypes[!, :y1] .< 1.0, :y1] .+ 0.5;
phenotypes[0.0 .< phenotypes[!, :y1] .< 1.0, :lower_bound1] .= phenotypes[0.0 .< phenotypes[!, :y1] .< 1.0, :y1] .- 0.5;
phenotypes[phenotypes[!, :y1] .> 2.0,:upper_bound1] .= Inf;
phenotypes[phenotypes[!, :y1] .> 2.0,:lower_bound1] .= 2.0;
phenotypes[phenotypes[!, :y1] .< -3.0,:upper_bound1].= -3.0;
phenotypes[phenotypes[!, :y1] .< -3.0,:lower_bound1].= -Inf;
phenotypes=phenotypes[:, Not(:y1)]  #drop :y1 column

#y2 - censored trait
phenotypes[!, :lower_bound2] .= round.(phenotypes[!, :y2],digits=2);
phenotypes[!, :upper_bound2] .= round.(phenotypes[!, :y2],digits=2);
phenotypes[0.0 .< phenotypes[!, :y2] .< 1.0, :upper_bound2] .= phenotypes[0.0 .< phenotypes[!, :y2] .< 1.0, :y2] .+ 0.5;
phenotypes[0.0 .< phenotypes[!, :y2] .< 1.0, :lower_bound2] .= phenotypes[0.0 .< phenotypes[!, :y2] .< 1.0, :y2] .- 0.5;
phenotypes[phenotypes[!, :y2] .> 2.0,:upper_bound2] .= Inf;
phenotypes[phenotypes[!, :y2] .> 2.0,:lower_bound2] .= 2.0;
phenotypes[phenotypes[!, :y2] .< -3.0,:upper_bound2].= -3.0;
phenotypes[phenotypes[!, :y2] .< -3.0,:lower_bound2].= -Inf;
phenotypes=phenotypes[:, Not(:y2)]  #drop :y2 column

JWAS analysis

# Step 0: Load packages
using JWAS,DataFrames,CSV,Statistics,JWAS.Datasets,Random
Random.seed!(1)

# Step 1: Rename the lower bound as "traitname_l", and upper bound as "traitname_u"
rename!(phenotypes, :lower_bound1 => :y1_l, :upper_bound1 => :y1_u)
rename!(phenotypes, :lower_bound2 => :y2_l, :upper_bound2 => :y2_u)

# Step 2: Read data
pedfile    = dataset("pedigree.csv")
pedigree   = get_pedigree(pedfile,separator=",",header=true);
genofile   = dataset("genotypes.csv")
genotypes  = get_genotypes(genofile,separator=',',method="BayesC");

# Step 3: Build Model Equations
model_equation  ="y1 = intercept + x1 + x2 + x2*x3 + ID + dam + genotypes
                  y2 = intercept + x1 + x2 + ID + genotypes"
model = build_model(model_equation,
                    censored_trait=["y1","y2"])

# Step 4: Set Factors or Covariates
set_covariate(model,"x1");

# Step 5: Set Random or Fixed Effects
set_random(model,"x2");
set_random(model,"ID dam",pedigree);

# Step 6: Run Analysis
out=runMCMC(model,phenotypes,chain_length=5000);

# Step 7: Check Accuruacy
results    = innerjoin(out["EBV_y1"], phenotypes, on = :ID)
accuruacy  = cor(results[!,:EBV],results[!,:bv1])

results    = innerjoin(out["EBV_y2"], phenotypes, on = :ID)
accuruacy  = cor(results[!,:EBV],results[!,:bv2])

Single-trait/multi-trait Linear Additive Genetic Model

To run analysis without genomic data, just remove "genotypes" in the model_equation from the script above.

Clone this wiki locally