Merge pull request #10 from disgruntled-llama/main

docs about proteins
IISc-Software-iGEM · Sep 13, 2024 · 12cfa8d · 12cfa8d
2 parents 00745d0 + 68d1ab3
commit 12cfa8d
Show file tree

Hide file tree

Showing 4 changed files with 232 additions and 0 deletions.
diff --git a/docs/filetype.md b/docs/filetype.md
@@ -0,0 +1,123 @@
+# A brief introduction to several file types:
+
+## PDB Files
+
+1. ATOM
+    - atomic coordinate record containing the x,y,z orthogonal Angstrom coordinates for atoms in standard residues (amino acids and nucleic acids).
+
+2. HETATM
+    - atomic coordinate record containing the x,y,z orthogonal Angstrom coordinates for atoms in nonstandard residues. Nonstandard residues include inhibitors, cofactors, ions, and solvent. The only functional difference from ATOM records is that HETATM residues are by default not connected to other residues. Note that water residues should be in HETATM records.
+
+3. TER
+    - indicates the end of a chain of residues. For example, a hemoglobin molecule consists of four subunit chains which are not connected. TER indicates the end of a chain and prevents the display of a connection to the next chain.
+
+4. SSBOND
+    - defines disulfide bond linkages between cysteine residues.
+
+5. HELIX
+    - indicates the location and type (right-handed alpha, etc.) of helices. One record per helix.
+
+6. SHEET
+    - indicates the location, sense (anti-parallel, etc.) and registration with respect to the previous strand in the sheet (if any) of each strand in the model. One record per strand.
+
+### Example
+____
+ATOM      1  N   MET A   1      38.292  13.351   7.926  1.00 20.00           N
+
+ATOM      2  CA  MET A   1      37.905  12.048   8.510  1.00 20.00           C
+
+Here, each line represents an atom, with columns specifying the atom number, atom type, residue name, chain identifier, and atomic coordinates.
+___
+
+## Advantages of PDB file
+* **Standardised format:** PDB is a widely accepted and standardized file format, making it easy to share and exchange structural data.
+
+* **Large Database:** The Protein Data Bank (PDB) contains over 175,000 experimentally determined 3D structures, providing a wealth of structural information.
+
+*  **Visualization Tools:** PDB files are compatible with numerous visualization and analysis tools, making it easy to explore protein structures.
+
+## PQR Files
+
+Format:-
+> Field_name Atom_number Atom_name Residue_name Chain_ID Residue_number X Y Z Charge Radius
+
+1. Field_name
+    - A string which specifies the type of PQR entry and should either be ATOM or HETATM in order to be parsed by APBS.
+
+2. Atom_number
+    - An integer which provides the atom index.
+
+3. Atom_name
+    - A string which provides the atom name.
+
+4. Residue_name
+    - A string which provides the residue name.
+
+5. Chain_ID
+    - An optional string which provides the chain ID of the atom. Note that chain ID support is a new feature of APBS 0.5.0 and later versions.
+
+6. Residue_number
+    - An integer which provides the residue index.
+
+7. X Y Z
+    - 3 floats which provide the atomic coordinates (in Å)
+
+8. CHarge
+    - A float which provides the atomic charge (in electrons).
+
+9. Radius
+    - A float which provides the atomic radius (in Å).
+
+> Clearly, this format can deviate wildly from PDB due to the use of whitespaces rather than specific column widths and alignments. This deviation can be particularly significant when large coordinate values are used. However, in order to maintain compatibility with most molecular graphics programs, the PDB2PQR program and the utilities provided with APBS attempt to preserve the PDB format as much as possible.
+
+## XYZ Files
+
+1. The first line of a frame specifies the number of particles (N) in the frame. It is an integer number. No other text is allowed on this line.
+
+2. The second line is a comment line. A comment may be placed here or the line may be left blank. This line is igored by the program.
+
+3. There are then N lines, each of which describes the coordinates of a single particle. These lines consist of the identity of a particle followed by 3 spatial coordinates. No other text may be included in this line.
+
+4. The identity of a particle is specified by a single letter or number. The coordinates are given as floating point numbers. Each of these elements is separated by either a single space or single tab-space.
+
+5. If there are multiple timesteps then each timestep is appended directly after the last. It is not required that any quantities are conserved between timesteps (number of particles, particle identities etc.), each timestep is treated separately. It is not required to label or otherwise number frames although this is a good use of the comment line.
+
+## PDB vs PQR
+## PDB (Protein Data Bank) File
+
+* **Purpose:**  PDB files are used to store three-dimensional structures of biological molecules, like proteins, nucleic acids, and complex assemblies. These files are a standard format for representing molecular structures in the field.
+
+* **Format:** A PDB file contains detailed information about the atoms in a molecule, including their coordinates, element types, and bonds. It also includes metadata such as the molecule's name, authors, and experimental conditions.
+
+* **Usage:** These files are widely used in molecular visualization tools and software for analyzing and simulating biomolecules
+
+
+
+## PQR File
+* **Purpose:** PQR files are similar to PDB files but include additional information, specifically atomic charge and radius data. The "PQR" name comes from the combination of "PDB", "Q" for charge, and "R" for radius.
+
+* **Format:** The format is almost identical to the PDB format, but with additional columns for partial charges and atomic radii.
+* **Usage:** PQR files are often used in electrostatics calculations, particularly with software like APBS (Adaptive Poisson-Boltzmann Solver) to determine the electrostatic potential of biomolecules.
+
+### Example
+___
+ATOM      1  N   MET A   1      38.292  13.351   7.926  1.00 20.00      -0.3    1.85
+
+ATOM      2  CA  MET A   1      37.905  12.048   8.510  1.00 20.00      0.21    1.70
+
+Here, the additional columns at the end represent the partial charge and atomic radius.
+___________
+
+
+
+## Key Reasons Why PQR Files Are Useful for Electrostatic Calculations:
+
+### 1. Inclusion of Partial Charges:
+* **Electrostatic Interactions:** The electrostatic potential of a molecule depends on the distribution of charges across its atoms. PQR files include partial charges for each atom, which are necessary to calculate how the molecule interacts with electric fields, solvents, and other molecules. 
+
+* **Electrostatic Potential Maps:** Tools like APBS (Adaptive Poisson-Boltzmann Solver) use these partial charges to compute electrostatic potential maps, which are crucial for understanding molecular interactions, binding sites, and reactivity.
+
+### 2. Atomic Radii:
+* **Solvent Accessibility:** The atomic radii in PQR files are used to model how molecules interact with their environment, particularly with solvents. The radii help determine the solvent-accessible surface area, which influences the molecule's electrostatic properties.
+
+* **Poisson-Boltzmann Equation:** When solving the Poisson-Boltzmann equation (a key equation in electrostatics), the atomic radii are used to define the dielectric boundary between the molecule and its surrounding environment. This boundary is crucial for accurate calculations of the electrostatic potential.
diff --git a/docs/forcefield.md b/docs/forcefield.md
@@ -0,0 +1,79 @@
+# An overview of different force fields:
+
+## AMBER Force Field
+AMBER (Assisted Model Building with Energy Refinement) is the collected name for a number of programs which conducts molecular dynamics simulations.  
+It can model many biomolecules like protein, nucleic acids, lipids, carbohydrates etc. 
+
+### What AMBER Does:
+
+A mathematical equation represents the form of AMBER, described below:
+
+- Represents bond covalent bond energies.
+- Energy due to position of electron orbitals.
+- Accounts for torsional strain.
+- Represents Van-der Waal and electrostatic energies.
+
+### Representation of Atoms
+It employs an all-atom model, that is, all atoms are explicitly represented. (Hydrogen not mentioned seperately)
+
+### Parameter Sets:
+
+To use AMBER, one needs to provide various parameters like bond length, bond angles, charges, equilibrium bond length etc. Each parameter set is defined by an OFF or PREP file.
+
+### Force Field:
+
+- Bond stretching and angle bending are modelled as harmonic potential (1/2 * k (x - x<sub>o</sub>)<sup>2</sup>).
+- Dihedral angles (torsional strain) is modelled by periodic potential (mathematical model contains potential energy, amplitude etc. as parameters)
+- Electrostatic (Coulomb’s law) and Van der Waals (Lennard-Jones Potential)
+
+### Recommended Force Fields
+
+Random usage of force field is highly discouraged for modelling. There are certain force fields which work well for certain molecules/ions. Here is a list given below (Source: The Amber Project).  
+
+| Molecule/Ion | Force Field |
+| ------------ | ----------- |
+| Protein      | ff19SB      |
+| DNA          | OL21        |
+| RNA          | OL3         |
+| Carbohydrates| GLYCAM_06j  |
+| Lipids       |lipids21     |
+
+
+
+### References:
+The Amber Project: https://ambermd.org/   
+Journal of Computational Chemistry: https://onlinelibrary.wiley.com/doi/epdf/10.1002/jcc.540020311  
+University of Oregon: https://www.uoxray.uoregon.edu/local/manuals/biosym/discovery/General/Forcefields/AMBER.html    
+Cornell, W. D., et al. (1995). "A Second Generation Force Field for the Simulation of Proteins, Nucleic Acids, and Organic Molecules." Journal of the American Chemical Society, 117(19), 5179-5197.
+
+## CHARMM - Chemistry at HARvard Macromolecular Mechanics
+
+- similar to amber but provides more parameters.
+- treats all the atoms separately to find potential energy.
+- Also provides parameters to include polarized charges (an advantage over amber)
+- target molecules are - proteins, nucleic acids and membranes.
+
+### Atomic representation
+- Each atom is a point charge
+- For practical purposes, Hydrogen atoms are combined with nearby atoms to give an extended atomic repr.
+
+### Empirical Energy Function
+- Uses some formulae to calculate potential energy of the system.
+- Accounts for bond potential, bond angle potential, torsion angle potential, van der Walls forces, and electrostatic potential.
+
+
+### Generation of data structure - to calculate potential
+- Sophisticated shit
+
+### Mechanics and Energetics
+- The system tends to achieve a state of low energy. 
+- Thus, it is desirable to calculate minima of energy, and adjust the coordinates of the system.
+- Along with this, there is also a trajectory - due to initial velocity of the atoms.
+- Due to computational complexity, it becomes impossible to calculate global minima, and we will have to suffice w/ a local minima. 
+
+### Further reading
+[CHARMM: A Program for Macromolecular Energy, Minimization and Dynamics Calculations](https://onlinelibrary.wiley.com/doi/epdf/10.1002/jcc.540040211)
+
+
+
+
diff --git a/docs/intro.md b/docs/intro.md
@@ -0,0 +1,15 @@
+## 1. What is Computational Biology?
+
+   Computational Biology can be roughly considered to be the amalgamation of several fields, such as mathematics, data science, chemistry with biology to understand and analyse various fields(of biology) such as genetics, neuroscience, etc.
+
+## 2. How did it start?
+
+   The origins of computational biology, not too surprisingly can be traced back to the origins of computation. The implementation of a model of biological morphogenesis (the development of pattern and form in living organisms) was started by British mathematician **Alan Turing** in the 1950s. This decade also witnessed the incorporation of computational methods in phylogenetics.
+
+   By the 1960s, computers could deal with complex structures such as proteins marking the rise of computational biology as a field. Computers had become an essential part of scientists' lives, especially in analysing the **3D structure of proteins**. 
+
+   These were followed by the use of Machine Learning and Artificial Intelligence in solving complex biological problems. The **Human Genome Project(HGP)** played a pivotal role in the popularisation of the field.
+
+## 3. What PEP focusses on?
+
+   PEP focusses on the application of Computational Biology in analysing the structure of proteins. It is a beginner-friendly tool for anyone who's interested to join the exciting field of computational biology and a step forward from existing Computational Biology tools, in terms of user-friendliness and can be used by anyone who's willing and eager to learn. 
diff --git a/docs/proteins.md b/docs/proteins.md
@@ -0,0 +1,15 @@
+## 1. What are proteins? 
+
+   Proteins are polymers(one or multiple chains) of amino acid residues. They are an integral part of all living organism, and play an indispensable role in a variety of body functions such as DNA/RNA replication, structural development, transportation, etc. 
+
+## 2. What are amino acids?
+
+   Amino acids are organic compounds containing **amino** as well as **carboxylic acid** groups in their molecular structure. 
+
+## 3. How are the structures of proteins determined?
+
+   The structures of proteins are most commonly determined by the process of **X-Ray Crystallography**, the discovery of which was a culmination of continued efforts, especially in the early 20th Century. 
+
+## 4. What are the different types of protein structures?
+
+   The simplest structures of proteins are their primary ones, which can be represented as a linear sequence of amino acids. Secondary structures are more complicated and maybe represented in the form of helices or sheets. Additionally, we can have tertiary and quaternary structures.