Skip to content

Feature Table

AlexanderGress edited this page Nov 17, 2021 · 7 revisions

The feature table is an additional file StructMAn produces. It is a tab-separated table file. Each row represents the computed results around one amino acid position of one queried protein. Compared to the classification table it contains more specific results and values. The results listed in this table can be used to assist machine learning methods that focus on amino acids or point mutations. It can list results specific for queried amino acid positions as well as for queried point mutations. In the following, all its columns are explained:

Name Description
Input Protein ID Protein identifier given by the input file.
Primary Protein ID Protein identifier used interally byt StructMAn.
Uniprot-Ac Uniprot accession ID of the query protein.
WT Amino Acid The one-letter amino acid type of the wildtype version of the query.
Position The position of the queried mutation in the sequence of the query.
Mut Amino Acid The one-letter amino acid type of the mutated version of the query.
AA change A combination of WT Amino Acid, Position, and Mut Amino Acid.
Tags The tags given in by the input of the query. When doing supervised machine learning this can be used to add the target value.
Distance-based classification The classification based on euclidean distance calculations.
Distance-based simple classification A simplified version of the distance-based classification.
RIN-based classification The classification based on residue interaction networks.
RIN-based simple classification A simplified version of the RIN-based classification.
Classification confidence A confidence value for the classification based on how many structures went into the classification, the overall quality of these structures, and the consistency of the information from different structures.
Weighted Location Annotation of the structural location of the queried position regarding the solvent-accessible surface area. Either Surface, Buried or Core.
Weighted Mainchain Location Structural location of the queried position solely based on main chain atoms.
Weighted Sidechain Location Structural location of the queried position solely based on side chain atoms.
RSA Aggregated Relative Solvent-accessible Area of the queried position.
Mainchain RSA Aggregated Relative Solvent-accessible Area of the queried position solely based on main chain atoms.
Sidechain RSA Aggregated Relative Solvent-accessible Area of the queried position solely based on side chain atoms.
Amount of mapped structures The number of structures the queried position could be mapped to.
Secondary structure assignment The aggregated secondary structure assignment obtained by a majority vote of the secondary structure assignments done by DSSP of all mapped residues.
IUPred value Aggregated disorder score by IUpred3.
Region structure type Aggregated structure type: disordered region or globular region.
Modres score Aggregated score for the tendency of the queried position to get post-translationally modified.
Phi Aggregated phi angle.
Psi Aggregated psi angle.
KD mean The difference in Kyte-Doolittle (KD) hydropathy score of the wildtype residue and the mutated residue.
Volume mean The difference in van-der-Waals volume of the wildtype residue and the mutated residue.
Chemical distance Value of substitution in the chemical distance substitution matrix based on .
Blosum62 Value of substitution in the Blosum62 substitution matrix.
Aliphatic change Boolean denoting a change in the aliphatic class of the substitution.
Hydrophobic change Boolean denoting a change in the hydrophobic class of the substitution.
Aromatic change Boolean denoting a change in the aromatic class of the substitution.
Positive charged change Boolean denoting a change in the positive charged class of the substitution.
Polar change Boolean denoting a change in the polar class of the substitution.
Negative charge change Boolean denoting a change in the negative charge class of the substitution.
Charged change Boolean denoting a change in the charged class of the substitution.
Small change Boolean denoting a change in the small class of the substitution.
Tiny change Boolean denoting a change in the tiny class of the substitution.
Total change The sum of all class changes of the substitution.
B Factor Aggregated b factor value.
AbsoluteCentrality Aggregated network centrality value of all mapped residues. The centrality values are calculated from the residue interaction network of the chain of the mapped residue isolated from possible other chains given in the structure.
LengthNormalizedCentrality Aggregated length normalized centrality value of all mapped residues. The centrality values are normalized by the size of the chain of the mapped residue.
MinMaxNormalizedCentrality Aggregated min-max-normalized centrality value of all mapped residues. The centrality values are normalized by a scale based on the maximal and minimal network centrality values of all residues of the chain of the mapped residue.
AbsoluteCentralityWithNegative Same as AbsoluteCentrality, but the residue interaction networks include negative edges.
LengthNormalizedCentralityWithNegative Same as LengthNormalizedCentrality, but the residue interaction networks include negative edges.
MinMaxNormalizedCentralityWithNegative Same as MinMaxNormalizedCentrality, but the residue interaction networks include negative edges.
AbsoluteComplexCentrality Aggregated network centrality value of all mapped residues. The centrality values are calculated from the residue interaction network of all chains given in the structure.
LengthNormalizedComplexCentrality Aggregated length normalized centrality value of all mapped residues. The centrality values are calculated from the residue interaction network of all chains given in the structure.
MinMaxNormalizedComplexCentrality Aggregated min-max-normalized centrality value of all mapped residues. The centrality values are calculated from the residue interaction network of all chains given in the structure.
AbsoluteComplexCentralityWithNegative Same as AbsoluteComplexCentrality, but the residue interaction networks include negative edges.
LengthNormalizedComplexCentralityWithNegative Same as LengthNormalizedComplexCentrality, but the residue interaction networks include negative edges.
MinMaxNormalizedComplexCentralityWithNegative Same as MinMaxNormalizedComplexCentrality, but the residue interaction networks include negative edges.
Intra_SSBOND_Propensity Propensity of mapped residue forming a cysteine-cysteine bond with a cysteine from the same chain.
Inter_SSBOND_Propensity Propensity of mapped residue forming a cysteine-cysteine bond with a cysteine from another chain.
Intra_Link_Propensity Propensity of mapped residue forming a covalent bond with a residue from the same chain.
Inter_Link_Propensity Propensity of mapped residue forming a covalent bond with a residue from another chain.
CIS_Conformation_Propensity Propensity of mapped residue having a peptide bond in cis conformation to the next residue.
CIS_Follower_Propensity Propensity of mapped residue having a peptide bond in cis conformation to the previous residue.
Inter Chain Median KD Aggregated median hydropathy value of all residues of the same chain closer than 10 angstroms of the mapped residue.
Inter Chain Distance Weighted KD Aggregated distance-weighted hydropathy value of all residues of the same chain closer than 10 angstroms of the mapped residue. Distance-weighted means that the hydropathy values got aggregated based on the distance to the mapped residue.
Inter Chain Median RSA Aggregated median relative solvent-accessible area of all residues of the same chain closer than 10 angstroms of the mapped residue.
Inter Chain Distance Weighted RSA Aggregated distance-weighted relative solvent-accessible area of all residues of the same chain closer than 10 angstroms of the mapped residue. Distance-weighted means that the RSA values got aggregated based on the distance to the mapped residue.
Intra Chain Median KD Aggregated median hydropathy value of all residues of another chain closer than 10 angstroms of the mapped residue.
Intra Chain Distance Weighted KD Aggregated distance-weighted hydropathy value of all residues of another chain closer than 10 angstroms of the mapped residue.
Intra Chain Median RSA Aggregated median relative solvent-accessible area of all residues of another chain closer than 10 angstroms of the mapped residue.
Intra Chain Distance Weighted RSA Aggregated distance-weighted relative solvent-accessible area of all residues of another chain closer than 10 angstroms of the mapped residue.
Inter Chain Interactions Median Aggregated median interaction score of local close-by residues that belong to the same protein chain.
Inter Chain Interactions Distance Weighted Aggregated distance-weighted interaction score of local close-by residues that belong to the same protein chain.
Intra Chain Interactions Median Aggregated median interaction score of local close-by residues that belong to another protein chain.
Intra Chain Interactions Distance Weighted Aggregated distance-weighted interaction score of local close-by residues that belong to another protein chain.
[neighbor, short, long, ligand, ion, metal, Protein, DNA, RNA, Peptide] score Aggregated sum of interaction scores over all edges of the mapped residue in the residue interaction network to specific interaction partners. Neighbor: both neighboring residues connected by the main chain. Short: non-neighbors that are closer than 6 positions in the sequence of the protein. Long: All residues that are not neighbors or short of the same chain. Ligand: any low-molecular-weight molecule in the structure. Ion: any non-metal ion. Metal: any metal ion. Protein: Any residue from another chain in the structure. DNA: any nucleic acid from a DNA chain in the structure. RNA: any nucleic acid from a RNA chain in the structure. Peptide: any residue from a non-protein peptide in the structure.
[neighbor, short, long, ligand, ion, metal, Protein, DNA, RNA, Peptide] degree Aggregated number of edges of the mapped residue in the residue interaction network to specific interaction partners.
[neighbor, short, long, ligand, ion, metal, Protein, DNA, RNA, Peptide] H-bond score Aggregated sum of H-bond scores over all edges of the mapped residue in the residue interaction network to specific interaction partners.
Clone this wiki locally