Skip to content
Duo Peng edited this page Sep 7, 2023 · 15 revisions

Welcome to protoSpaceJAM Wiki

The behavior and algorithm of protoSpaceJAM can be customized by:

  1. Flexible parameterization
  2. Modifying source code


Flexible parameterization

Input/output parameters

--path2csv                 Path to a csv file containing the input knock-in sites, see input/test_input.csv for an example
--outdir                   Path to the output directory
--genome_ver               Genome and version to use, possible values are GRCh38, GRCm39, and GRCz11

gRNA parameters

--num_gRNA_per_design      Number of gRNAs to return per site (default: 1)
--no_regulatory_penalty    Turn off penalty for gRNAs cutting in UTRs or near splice junctions (default: penalty on)

Payload parameters

--payload                  Define the payload sequence for every site, regardless of terminus or coordinates, overrides all other payload parameters
--Npayload                 Payload sequence to use at the N terminus (default: mNG11 + XTEN80)
--Cpayload                 Payload sequence to use at the C terminus (default: XTEN80 + mNG11)
--POSpayload               Payload sequence to use at the specific genomic coordinates (default: XTEN80 + mNG11)

Donor parameters

--Donor_type               Set the type of donor, possible values are ssODN and dsDNA (default: ssODN)
                           This option affects the donor processing strategy.
--HA_len                   [dsDNA] Length of the desired homology arm on each side (default: 500)
--CheckEnzymes             [dsDNA] Name of Restriction digestion enzymes, separated by "|", to flag and trim, for example BsaI|EcoRI (default: None)
--CustomSeq2Avoid          [dsDNA] Custom sequences, separated by "|", to flag and trim (default: None)
--MinArmLenPostTrim        [dsDNA] Minimum length of the homology arm after trimming. Set to 0 to turn off trimming (default: 0)
--Strand_choice            [ssODN] Strand choice of ssoODN (default: auto)
                                   Possible values are "auto", "TargetStrand", "NonTargetStrand", "CodingStrand" and "NonCodingStrand"
--ssODN_max_size           [ssODN] Enforce a length restraint on the the ssODN donor (default: 200)
                                   The ssODN donor will be centered on the payload and the recoded region

See this page for details on how ssODN and dsDNA donors are differently processed

Recoding parameters

--recoding_off              Turn recoding off
--recoding_stop_recut_only  Only recode in the gRNA recognition site
--recoding_full             Use full recoding, recode both the gRNA recognition site and the cut-to-insert region (default: on)
--cfdThres                  Threshold that protoSpaceJAM will attempt to lower the recut potential to (default: 0.03). 
                            The recut potential is measured by the CFD score.
--recode_order              Prioritize recoding in the PAM or in protospacer, possible values: protospacer_first, PAM_first (default: PAM_first)



Modifying source code

1. Calculation of gRNA scoring weights

source code file:

protoSpaceJAM 
└── protoSpaceJAM                                     
     └── util                                 
         └── utils.py    

  • On-target specificity weight is computed by function _specificity_weight() using argument specificity_score
    Change the lower and higher bounds by changing the values of following variables
    _specificity_weight_low (default: 45)
    _specificity_weight_high (default: 65)
    or replace the function with your own version that implements a completely different way to calculate specificity weight.

  • Cut-to-insert distance Gaussian weight is computed by function _dist_weight() using argument hdr_dist(cut-to-insert distance)
    Modify the Gaussian curve by changing the following:
         Variance: _dist_weight_variance=55
         Gaussian formula: weight = math.exp((-1 * hdr_dist ** 2) / (2 * variance))
    or replace the function with your own version that implements a completely different way to compute the Cut-to-insert distance weight.

  • gRNA position weight: _position_weight(), change the penalty by modifying the mapping dictionary:
    mapping_dict = {
        "5UTR": 0.4,
        "3UTR": 1,
        "cds": 1,
        "within_2bp_of_exon_intron_junction": 0.01,
        "within_2bp_of_intron_exon_junction": 0.01,
        "3N4bp_up_of_exon_intron_junction": 0.1,
        "3_to_6bp_down_of_exon_intron_junction": 0.1,
        "3N4bp_up_of_intron_exon_junction": 0.1,
        "3N4bp_down_of_intron_exon_junction": 0.5,
    }


2. Recoding of donor DNA

source code file:

protoSpaceJAM 
└── protoSpaceJAM                                     
     └── util                                 
         └── hdr.py    

  • Silent mutations in a stretch of codons are predicted by the function mutate_silently()
    This function maximizes silent mutations introduced to a stretch of codons while avoiding changing to and from rare codons.
    Users can change the synonymous codon table, as well as the black-listed rare codons, or replace the function with their own version that implements a completely different way to introduce silent mutations to a stretch of codons

  • Single-base mutations (in non-coding regions) are processed by method single_base_mutation() in the HDR_flank class
    To change how single bases are mutated, users can change the base mapping dictionary below, which maps each base to their mutation:
        mapping = {
            "A": "t",
            "a": "t",
            "C": "g",
            "c": "g",
            "G": "c",
            "g": "c",
            "T": "a",
            "t": "a",
        }

        Please note that this mapping is case-insensitive.


  • Single-base mutations (in non-coding regions) have a default interval of 1 in every 3 bp (to match average base mutation frequency in codons).
    Users can customize this behavior by changing the value of variable self.mut_every_n (default: 3) in the HDR_flank class.

  • Homopolymer is defined by regular expression r"([ATat])\1{9,}|([CGcg])\2{5,}" in the HDR_flank class.
    The current definition is 10+ consecutive As or Ts, 6+ consecutive C or G.
    Users can modify this regular expression to define their own homopolymers.

  • GC content window skewness is calculated in the HDR_flank class:
    slide_win_GC_content() returns a list of GC contents, one for each sliding window, as defined by argument win_size.
    max_diff = max(win_GC) - min(win_GC) calculates the maximum difference between any pair of sliding windows.