-
Notifications
You must be signed in to change notification settings - Fork 1
Home
Duo Peng edited this page Sep 7, 2023
·
15 revisions
The behavior and algorithm of protoSpaceJAM can be customized by:
Input/output parameters
--path2csv Path to a csv file containing the input knock-in sites, see input/test_input.csv for an example
--outdir Path to the output directory
--genome_ver Genome and version to use, possible values are GRCh38, GRCm39, and GRCz11
gRNA parameters
--num_gRNA_per_design Number of gRNAs to return per site (default: 1)
--no_regulatory_penalty Turn off penalty for gRNAs cutting in UTRs or near splice junctions (default: penalty on)
Payload parameters
--payload Define the payload sequence for every site, regardless of terminus or coordinates, overrides all other payload parameters
--Npayload Payload sequence to use at the N terminus (default: mNG11 + XTEN80)
--Cpayload Payload sequence to use at the C terminus (default: XTEN80 + mNG11)
--POSpayload Payload sequence to use at the specific genomic coordinates (default: XTEN80 + mNG11)
Donor parameters
--Donor_type Set the type of donor, possible values are ssODN and dsDNA (default: ssODN)
This option affects the donor processing strategy.
--HA_len [dsDNA] Length of the desired homology arm on each side (default: 500)
--CheckEnzymes [dsDNA] Name of Restriction digestion enzymes, separated by "|", to flag and trim, for example BsaI|EcoRI (default: None)
--CustomSeq2Avoid [dsDNA] Custom sequences, separated by "|", to flag and trim (default: None)
--MinArmLenPostTrim [dsDNA] Minimum length of the homology arm after trimming. Set to 0 to turn off trimming (default: 0)
--Strand_choice [ssODN] Strand choice of ssoODN (default: auto)
Possible values are "auto", "TargetStrand", "NonTargetStrand", "CodingStrand" and "NonCodingStrand"
--ssODN_max_size [ssODN] Enforce a length restraint on the the ssODN donor (default: 200)
The ssODN donor will be centered on the payload and the recoded region
See this page for details on how ssODN and dsDNA donors are differently processed
Recoding parameters
--recoding_off Turn recoding off
--recoding_stop_recut_only Only recode in the gRNA recognition site
--recoding_full Use full recoding, recode both the gRNA recognition site and the cut-to-insert region (default: on)
--cfdThres Threshold that protoSpaceJAM will attempt to lower the recut potential to (default: 0.03).
The recut potential is measured by the CFD score.
--recode_order Prioritize recoding in the PAM or in protospacer, possible values: protospacer_first, PAM_first (default: PAM_first)
source code file:
protoSpaceJAM
└── protoSpaceJAM
└── util
└── utils.py
- On-target specificity weight is computed by function
_specificity_weight()
using argumentspecificity_score
Change the lower and higher bounds by changing the values of following variables
_specificity_weight_low
(default: 45)
_specificity_weight_high
(default: 65)
or replace the function with your own version that implements a completely different way to calculate specificity weight.
- Cut-to-insert distance Gaussian weight is computed by function
_dist_weight()
using argumenthdr_dist
(cut-to-insert distance)
Modify the Gaussian curve by changing the following:
Variance:_dist_weight_variance=55
Gaussian formula:weight = math.exp((-1 * hdr_dist ** 2) / (2 * variance))
or replace the function with your own version that implements a completely different way to compute the Cut-to-insert distance weight.
- gRNA position weight:
_position_weight()
, change the penalty by modifying the mapping dictionary:
mapping_dict = {
"5UTR": 0.4,
"3UTR": 1,
"cds": 1,
"within_2bp_of_exon_intron_junction": 0.01,
"within_2bp_of_intron_exon_junction": 0.01,
"3N4bp_up_of_exon_intron_junction": 0.1,
"3_to_6bp_down_of_exon_intron_junction": 0.1,
"3N4bp_up_of_intron_exon_junction": 0.1,
"3N4bp_down_of_intron_exon_junction": 0.5,
}
source code file:
protoSpaceJAM
└── protoSpaceJAM
└── util
└── hdr.py
- Silent mutations in a stretch of codons are predicted by the function
mutate_silently()
This function maximizes silent mutations introduced to a stretch of codons while avoiding changing to and from rare codons.
Users can change the synonymous codon table, as well as the black-listed rare codons, or replace the function with their own version that implements a completely different way to introduce silent mutations to a stretch of codons
- Single-base mutations (in non-coding regions) are processed by method
single_base_mutation()
in theHDR_flank
class
To change how single bases are mutated, users can change the base mapping dictionary below, which maps each base to their mutation:
mapping = {
"A": "t",
"a": "t",
"C": "g",
"c": "g",
"G": "c",
"g": "c",
"T": "a",
"t": "a",
}
Please note that this mapping is case-insensitive.
- Single-base mutations (in non-coding regions) have a default interval of 1 in every 3 bp (to match average base mutation frequency in codons).
Users can customize this behavior by changing the value of variableself.mut_every_n
(default: 3) in theHDR_flank
class.
- Homopolymer is defined by regular expression
r"([ATat])\1{9,}|([CGcg])\2{5,}"
in theHDR_flank
class.
The current definition is 10+ consecutive As or Ts, 6+ consecutive C or G.
Users can modify this regular expression to define their own homopolymers.
- GC content window skewness is calculated in the
HDR_flank
class:
slide_win_GC_content()
returns a list of GC contents, one for each sliding window, as defined by argumentwin_size
.
max_diff = max(win_GC) - min(win_GC)
calculates the maximum difference between any pair of sliding windows.