Skip to content

Simple Mutation List Format

AlexanderGress edited this page Nov 8, 2021 · 3 revisions

The simple mutation list format is the default input format for StructMAn. It is a space or tab-separated text file format. It consists of three columns:

1. Protein Id column

The first columns have to contain one of the following protein Ids encoding the query protein.

  1. Uniprot-AC,for example:
    P04637
    Those can also be isoform-specific:
    P04637-2
  2. Uniprot-ID, for example:
    P53_HUMAN
  3. Refseq-protein-ID (NP_), for example:
    NP_000537.3
  4. PDB-ID:[Chain-Id] (Enables using the amino acid sequence given by the backbone of any PDB structure as the input sequence. This changes the format of the second column), for example:
    4HJE:A

2. Position column

The position column has to be of the following format:
[One letter amino acid code][Integer value][One letter amino acid code]
The first one letter code represents the amino acid of the given position in the wild type. The correctness of this is checked by StructMAn and is replaced if necessary. One can also just give an X, the correct amino acid will then be looked up.
The integer value represents the index of the query position in the given query protein sequence.
The second one-letter amino acid code can be used to annotate an amino acid substitution of a given position and represents the amino acid in the mutated sequence. This is optional, it can be left empty if only the position independent from any amino acid substitutions should be annotated.
If the first column contains a PDB-ID, then one has to include a _ between the position and the mutant amino acid. This allows giving an insertion code (column 27 of PDB format) together with the position.

The whole second column can be left empty, resulting in StructMAn to annotate all positions of the query protein.

3. Tag column

In this column, one can add arbitrary tags for the position. One can add any number of tags separated by ,. The results later will be aggregated by all given tags. Further one can give tags in the form of values, using the following scheme:
#[tag_name]:[value]
for example:
#effect:0.773

Example:

P04637  annotate_whole_protein
P04637-3  annotate_isoform
Q96HA4 A31 annotate_specific_position
Q96HA4 P50 annotate_another_specific_position
Q96HA4 77 annotate_position_without_giving_amino_acid_type_also_possible
Q96HA4 P200V annotate_SNV
Q96HA4 P200A annotate_another_SNV
Q96HA4 200D also_possible
Q16637 P210_P241del deletion_example
P04439 S337_D338insGGEGVK insertion_example
P04637 R333_R337delinsKGKEK substitution_example
1JOO:A  PDB_style_input_annotates whole chain
1DPO:A M180 specific_position_PDB_style
1DPO:A V181_A SNVs_need_an_underscore_in_PDB_style
1DPO:A L188A this_annotates_position_188_with_insertion_code_A
1DPO:A L188A_A this_codes_the_SNV_of_lysine_position_188_with_insertion_code_A_to_alanine
1DPO:A G188_A this_codes_the_SNV_of_glycine_position_188_to_alanine