Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable GLYCAM06j-1 forcefield conversion #156

Merged
merged 10 commits into from
Sep 1, 2021

Conversation

zhang-ivy
Copy link
Collaborator

@zhang-ivy zhang-ivy commented Feb 27, 2021

Convert leaprc.GLYCAM_06j-1 to an opemm ffxml.

Requires ParmEd/ParmEd#1149

TODO:
- [ ] The protein atoms in GLYCAM_06j-1 are named N, CA, etc. Need to decide whether we should use ff14SB to match the existing protein atom names or enforce use of protein.ff14SB (which will require changing the protein atom names in GLYCAM_06j-1 to protein-N, protein-CA etc).
- [x] Allow override_level parameter in yaml input to convert_amber.py
- [ ] Write test for validating energies (openmm vs tleap system) -- I've started this, but its not working. See below.

Currently, the primary showstopper for the test is that ffxml/GLYCAM_06j-1.xml does not contain residues that are actually part of glycans. The only residues that are present are the modified protein residues to which glycans can bond (e.g. NLN, which is the modified version of ASN).
For example, here are some of the residues that should be present: [‘UYB’, ‘4YB’, ‘VMB’, ‘2MA’, ‘0YB’, ‘0FA’, ‘0LB’]

This problem shows up as an error when I run python convert_amber.py -i glycan.yaml at
https://github.com/zhang-ivy/openmmforcefields/blob/convert-glycans/amber/convert_amber.py#L626

ValueError: No template found for residue 197 (UYB).  The set of atoms is similar to C5, but it is missing 3 atoms.

Summary of the conversion pipeline:

  1. In openmmforcefields/amber/glycam/, convert the glycam prep file (GLYCAM_06j-1.prep) into a .lib file (GLYCAM_06j-1.lib)
  • Make a tleap.in file to do this:
# Read lines from prep file
with open("GLYCAM_06j-1.prep", "r") as f:
    lines_in = f.readlines()

# Make a list of residues
residues = []
# d_enantiomers = []
for line in lines_in:
    if "INT 0" in line:
        residue = line[:3]
        residues.append(residue)

# Remove CA2, as this isn't present as an atom type in ff14SB.xml or GLYCAM06j-1.xml or protein.ff14SB.xml or tip3p_standard.xml
residues.remove('CA2')

# Prep lines to write to tleap in file
lines = ["loadamberprep GLYCAM_06j-1.prep\n"]
for i in range(200, len(residues), 200): # Write 200 residues per line 
    lines.append("saveOff {" + " ".join(residues[i-200:i]) + "} GLYCAM_06j-1.lib\n")
lines.append("quit\n")

# Write lines
with open("make_glycam_lib.in", "w") as f:
    f.writelines(lines)
  • Then run tleap -s -f make_glycam_lib.in > make_glycam_lib.out
  • Output: GLYCAM_06j-1.lib
  1. In openmmforcefields/amber/glycam/, create a modified glycam leaprc file (that replaces loading the .prep with loading the .lib)

    • Output: leaprc.GLYCAM_06j-1
  2. In openmmforcefields/amber/glycam/, create a glycan.yaml and make sure to load the modified leaprc.

  3. From openmmforcefields/amber, run python convert_amber.py -i glycam/glycan.yaml --verbose

    • Note you will see warnings like this that are safe to ignore. ParmEd thinks that there are a lot of identical residues because it doesn’t take into stereochemistry or anomeric forms
/home/zhangi/miniconda3/envs/glycan-ff-2/lib/python3.8/site-packages/ParmEd-3.4.0+21.gffe7aaf-py3.8-linux-x86_64.egg/parmed/openmm/parameters.py:666: UserWarning: Residue <ResidueTemplate YzB: 18 atoms; 18 bonds; head=C1; tail=O2> will be considered by OpenMM to be identical to <ResidueTemplate YZB: 18 atoms; 18 bonds; head=C1; tail=O2>.
warnings.warn(f'Residue {residue} will be considered by OpenMM to be identical to {residue_collision}.')
  1. Use the converted glycam ffxml to parametrize a system:
from simtk import openmm
from simtk.openmm import app

# Load the ffxml
ffxml = ['../ffxml/protein.ff14SB.xml', "../ffxml/GLYCAM_06j-1_modified.xml"]

ff = app.ForceField(*ffxml)

# Load the test pdb
pdb = app.PDBFile("../test.pdb")

# Create system
system_omm = ff.createSystem(pdb.topology)

Summary of changes to existing codebases:

Parmed

Openmmforcefields

OpenMM

@zhang-ivy
Copy link
Collaborator Author

@jchodera @mikemhenry @rafwiewiora : could you help me identify why there are no glycan residues in ffxml/GLYCAM_06j-1.xml?
@peastman and @jchodera : could you help me decide whether we should enforce protein atom names in ffxml/GLYCAM_06j-1.xml to contain the protein- prefix (to be used with protein.ff14SB)? Or should we leave ffxml/GLYCAM_06j-1.xml as is and use ff14SB as the protein ff?

@mikemhenry
Copy link
Collaborator

Not sure if it helps or not but this site seems to have the parameter files:
http://glycam.org/docs/forcefield/all-parameters/
And the missing residues are in this file:
http://glycam.org/docs/forcefield/wp-content/uploads/sites/6/2014/03/GLYCAM_06j-1.prep

@jchodera
Copy link
Member

jchodera commented Mar 1, 2021

We managed to get to the root of the issue, and I will try a full conversion today!

@zhang-ivy
Copy link
Collaborator Author

I was able to address the "primary showstopper" (i.e. getting glycan residues into the ffxml, described in detail above) with the help of John's comment. What I did:

  1. Converted the glycam .prep file into a .lib file (using tleap -s -f amber/glycam/make_glycam_lib_trunc.in > make_glycam_lib_trunc.out): glycam/GLYCAM_06j-1.lib
  2. Created a modified glycam leaprc file (that replaces loading the .prep with loading the .lib): glycam/leaprc.GLYCAM_06j-1_trunc
  3. Updated glycam/glycan.yaml to load the modified leaprc.
  4. From amber/, ran: python convert_amber.py -i glycam/glycan.yaml --verbose

This generated ffxml/GLYCAM_06j-1_trunc.xml. Note that this ffxml is truncated because it only includes the following glycan residues (which are present in RBD:ACE2): ['UYB', '4YB', 'VMB', '2MA', '0YB', '0fA', '0LB']. I initially tried to include all glycan residues in the ffxml, but came across many problems related to OpenMM perceiving different templates as the same, so I figured we could deal with those issues later.

After generating the ffxml, I was getting errors like:
ValueError: No template found for residue 197 (UYB). The set of atoms matches UYB, but the bonds are different.
This was because the ExternalBonds in the ff templates were not correct. In GLYCAM_06j-1_trunc.xml, for the UYB residue, I was seeing:

     <ExternalBond atomName="C1"/>
     <ExternalBond atomName="O4"/>

But there should be a third line for atomName="O6". Since Parmed was not writing out all of the atoms with external bonds correctly, I added a temporary fix to parmed/openmm/parameters.py. This fix involved adding a function called _get_atoms_with_external_bonds() that gets atoms with external bonds.

Here is the function:

def _get_atoms_with_external_bonds(self, residue_name):
        # Determine the atoms with external bonds
        d = {'C': 4, 'N': 3, 'O': 2, 'H': 1} # Key: atom element name, Value: number of bonds it should have
        external_bonds = []
        for atom in self.residues[residue_name].atoms:
            bonds = 0

            # Add an extra bond if the atom is the O in a carbonyl
            if atom.name in ['OD1', 'O', 'O2N', 'OXT']:
                bonds += 1

            for bond in self.residues[residue_name].bonds:
                if bond.atom1 == atom or bond.atom2 == atom:
                    bonds += 1

                # Add extra bonds if the atom is the C in a carbonyl
                if (bond.atom1.name == 'OD1' and bond.atom2 == atom) or (bond.atom1 == atom and bond.atom2.name == 'OD1'):
                    bonds += 1
                elif (bond.atom1.name == 'O' and bond.atom2 == atom) or (bond.atom1 == atom and bond.atom2.name == 'O'):
                    bonds += 1
                elif (bond.atom1.name == 'O2N' and bond.atom2 == atom) or (bond.atom1 == atom and bond.atom2.name == 'O2N'):
                    bonds += 1
                elif (bond.atom1.name == 'OXT' and bond.atom2 == atom) or (bond.atom1 == atom and bond.atom2.name == 'OXT'):
                    bonds += 1
            if atom.element_name == 'Og': # If the atom is in a glycan, the element names are not set properly
                element_name = atom.name[0]
            else:
                element_name = atom.element_name
            if d[element_name] != bonds:
                external_bonds.append(atom)

        return external_bonds

And I added the following lines after this line:

external_bonds = self._get_atoms_with_external_bonds(name)
            for atom in external_bonds:
                if atom != residue.head and atom != residue.tail:
                    etree.SubElement(xml_residue, 'ExternalBond', atomName=atom.name)

After implementing this fix, I re-ran python convert_amber.py -i glycam/glycan.yaml --verbose and got an error in the test I wrote in convert_amber.py. The error occurs here and the message is:

parmed.exceptions.ParameterError: Cannot find necessary parameters

My temporary fix: At this line, I replace parm_omm with parm_amber.

When I re-run with this fix, I can compare the omm vs amber energies:

amber energies [('HarmonicBondForce', 786.2137738648838), ('HarmonicAngleForce', 1573.0058453514364), ('PeriodicTorsionForce', 9090.943943094859), ('PeriodicTorsionForce', 13.594734381835927), ('NonbondedForce', -15185.572665049249), ('CMMotionRemover', 0.0)]

omm energies [('HarmonicBondForce', 785.9383122713264), ('HarmonicAngleForce', 1567.5857045366686), ('PeriodicTorsionForce', 9034.076934658806), ('NonbondedForce', -4528.287677760537), ('CMMotionRemover', 0.0)]

Notice there are two PeriodicTorsionForces in the amber energies..? And the NonbondedForce energies differ by a lot..

amber/convert_amber.py Outdated Show resolved Hide resolved
@@ -1247,6 +1258,68 @@ def validate_merged_lipids(ffxml_name, leaprc_name):
os.unlink(f[1])
if verbose: print('Lipids energy validation for %s done!' % ffxml_name)

def validate_glyco_protein(ffxml_name, leaprc_name,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use this test idea?

ParmEd/ParmEd#1149 (comment)

@jchodera
Copy link
Member

When I re-run with this fix, I can compare the omm vs amber energies:

@zhang-ivy Can you paste the XML file you generated as a gist?

@zhang-ivy
Copy link
Collaborator Author

@jchodera : Here are the system xmls ( amber.xml and omm.xml): https://gist.github.com/zhang-ivy/0d7d192e0f8296d609b8d4d94e4098ce

@jchodera
Copy link
Member

@jchodera : Here are the system xmls ( amber.xml and omm.xml): https://gist.github.com/zhang-ivy/0d7d192e0f8296d609b8d4d94e4098ce

Oh, sorry, I was hoping to see the ffxml!

@zhang-ivy
Copy link
Collaborator Author

zhang-ivy commented Mar 30, 2021

@jchodera : The ffxml I generated is included in this PR: https://github.com/openmm/openmmforcefields/blob/b73b6b0e420a737bb315da79b1ba4a66397caf33/amber/ffxml/GLYCAM_06j-1_trunc.xml
Note that it only includes the glycan residues necessary for my RBD:ACE2 system, as I was running to problems when I tried converting all glycan residues

@zhang-ivy
Copy link
Collaborator Author

@peastman helped me identify that the reason why there is an energy discrepancy between the amber vs openmm-generated test systems is because the prefixes are not correct in GLYCAM_06j-1_trunc.xml.

I think I'll want to modify this function such that it iterates through all atom types in GLYCAM_06j.dat and if the atom type is present in ff14SB, add the protein- prefix, otherwise add the glycan- prefix.

@peastman and @jchodera : Please let me know what other considerations I need to take into account in order to write the appropriate changes necessary to address this prefix issue.

@peastman
Copy link
Member

Let me try to go over all the issues we need to consider. Hopefully I'm not missing any!

Prefixes were a compromise to deal with differences in how OpenMM and Amber think about atom types. OpenMM expects them to be globally unique. If you load two files together, they can't each define a type with the same name. But Amber sometimes has multiple files listing the same type. The solution was to add prefixes like protein-, DNA-, and RNA-. That way if you need to include both protein and DNA in a single system, you don't get conflicts.

Each atom type definition also defines a class, which is the same name but without the prefix:

<Type element="C" name="protein-C" class="C" mass="12.01"/>

When defining parameters, they can be defined either by type="protein-C" or class="C". It appears that in all cases they do the former:

<Bond type1="protein-C" type2="protein-C" length="0.1525" k="259407.99999999994"/>

and

<Bond type1="DNA-C" type2="DNA-C" length="0.1525" k="259407.99999999994"/>

These are basically the same definition, but one using protein types and the other using DNA types. This allows you to load either just the protein force field or just the DNA force field, and it has all parameters it needs. Or you can load both of them together, and you don't have to worry about duplicate or conflicting definitions. In principle we could separate all the common definitions into a different file that would specify them by class rather than type. That would be a substantial change though.

All of this works fine for protein, DNA, and RNA. Those are separate molecules so they can each be parametrized independently. But glycans are bonded to other molecules. A single bond/angle/torsion might include both protein and glycan atom types. We need to figure out how to handle that.

Atom types appear in one other place. ParmEd/ParmEd#1149 added a list of atom types that should have 1-4 interactions scaled differently. That list appears in an embedded script generated by ParmEd. We could make the conversion script modify them the same way it modifies types in other places, but it might be cleaner to modify ParmEd so it can generate types with correct prefixes right from the start.

@peastman
Copy link
Member

While working on this, I ran into what I think is a bug in the glycam force field. GLYCAM_06j.dat contains these lines.

NH-Cg-Cg-Sm   1    0.45          0.0            -3.         SCEE=1.0 SCNB=1.0 Thiol-linkages
              1    0.25          0.0             2.         SCEE=1.0 SCNB=1.0

However, the atom type NH is not defined. This is the only place it is mentioned anywhere in the file. It also is not a standard type used in the protein, DNA, or RNA force fields.

@peastman
Copy link
Member

Based on our meeting today, I created a script to post-process the XML file. It makes the following assumptions about the input file:

  1. No prefixes have been added.
  2. All parameters are specified by type, not class.
  3. The bad torsion noted above has been removed. (Otherwise it throws an exception.)

It creates a modified XML file with the following changes.

  1. All atom type definitions that are not GLYCAM-specific are removed.
  2. All parameter definitions that do not involve at least one GLYCAM-specific type are removed.
  3. For standard (not GLYCAM-specific) atom types, it adds the protein- prefix.

Here's a first draft of it.

import xml.etree.ElementTree as etree
import re
import textwrap

# Define atom types

protein_types = set(["C", "CA", "CB", "CC", "CN", "CR", "CT", "CV", "CW", "C*", "CX", "H", "HC", "H1", "HA", "H4", "H5",
                     "HO", "HS", "HP", "N", "NA", "NB", "N2", "N3", "O", "O2", "OH", "S", "SH", "CO", "2C", "3C", "C8"])
solvent_types = set(["HW", "OW", "Li+", "Na+", "K+", "Rb+", "Cs+", "F-", "Cl-", "Br-", "I-"])
glycam_types = set()
replacements = {}
for type in protein_types:
    replacements[type] = 'protein-'+type

# Process <AtomTypes>

tree = etree.parse('glycam.xml')
root = tree.getroot()
types = root.find('AtomTypes')
for type in types.findall('Type'):
    name = type.get('name')
    if name in protein_types or name in solvent_types:
        types.remove(type)
    else:
        glycam_types.add(name)
        replacements[name] = 'glycam-'+name
        type.set('name', replacements[name])

# Process <Residues>

residues = root.find('Residues')
for residue in residues.findall('Residue'):
    for atom in residue.findall('Atom'):
        atom.set('type', replacements[atom.get('type')])

# Process <HarmonicBondForce>

force = root.find('HarmonicBondForce')
for bond in force.findall('Bond'):
    types = [bond.get('type1'), bond.get('type2')]
    if any(t in glycam_types for t in types):
        bond.set('type1', replacements[types[0]])
        bond.set('type2', replacements[types[1]])
    else:
        force.remove(bond)

# Process <HarmonicAngleForce>

force = root.find('HarmonicAngleForce')
for angle in force.findall('Angle'):
    types = [angle.get('type1'), angle.get('type2'), angle.get('type3')]
    if any(t in glycam_types for t in types):
        angle.set('type1', replacements[types[0]])
        angle.set('type2', replacements[types[1]])
        angle.set('type3', replacements[types[2]])
    else:
        force.remove(angle)

# Process <PeriodicTorsionForce>

force = root.find('PeriodicTorsionForce')
for tag in ['Proper', 'Improper']:
    for torsion in force.findall(tag):
        types = [torsion.get('type1'), torsion.get('type2'), torsion.get('type3'), torsion.get('type4')]
        if any(t in glycam_types for t in types):
            torsion.set('type1', replacements[types[0]])
            torsion.set('type2', replacements[types[1]])
            torsion.set('type3', replacements[types[2]])
            torsion.set('type4', replacements[types[3]])
        else:
            force.remove(torsion)

# Process <NonbondedForce>

force = root.find('NonbondedForce')
for atom in force.findall('Atom'):
    type = atom.get('type')
    if type in glycam_types:
        atom.set('type', replacements[type])
    else:
        force.remove(atom)

# Update the unscaled types in the script

script = root.find('Script')
text = script.text
types = ', '.join('"%s"' % replacements[s] for s in sorted(glycam_types))
types = '\n    '.join(textwrap.wrap(types))
pattern = re.compile('unscaled_types = \[(.*\n)*?.*?\]')
script.text = pattern.sub('unscaled_types = ['+types+']', text)

tree.write('modified.xml')

@zhang-ivy
Copy link
Collaborator Author

zhang-ivy commented Jun 24, 2021

Thanks @peastman ! Excited to try that out.
As discussed in our meeting on Monday, my task was to adapt the tleap.in file used to convert the GLYCAM_06j-1.prep to a GLYCAM_06j-1.lib to contain only one enantiomer of each residue. I've done that, but when I run python convert_amber.py -i glycam/glycan.yaml --verbose, I get an error.

Here is what I did:

  1. Wrote a python script that creates a tleap.in file that takes in GLYCAM_06j-1.prep and generates a .lib file. Grabs all residue names from GLYCAM_06j-1.prep and keeps only one enantiomer for each residue. Output: make_glycam_lib_one_enantiomer.in
    Script:
# Read lines from prep file
with open("leaprc_GLYCAM_06j-1_2014-03-14/GLYCAM_06j-1.prep", "r") as f:
    lines_in = f.readlines()

# Grab residues, keeping only one enantiomer of each
residues = []
for line in lines_in:
    residue = line[:3]
    if "INT 0" in line and residue.upper() not in residues: # Keep only one enantiomer
        residues.append(residue)

# Make list of lines to write to the tleap.in file
lines = ["loadamberprep GLYCAM_06j-1.prep\n"]
for i in range(200, len(residues), 200):
    lines.append("saveOff {" + " ".join(residues[i-200:i]) + "} GLYCAM_06j-1.lib\n")
lines.append("quit\n")

# Write lines
with open("make_glycam_lib_one_enantiomer.in", "w") as f:
    f.writelines(lines)
  1. Converted the glycam .prep file into a .lib file (using tleap -s -f make_glycam_lib_one_enantiomer.in > make_glycam_lib_one_enantiomer.out). Output: GLYCAM_06j-1.lib
  2. Created a modified glycam leaprc file (that replaces loading the .prep with loading the .lib): leaprc.GLYCAM_06j-1_trunc
  3. Updated glycan.yaml to load the modified leaprc.
  4. From amber/, ran: python convert_amber.py -i glycam/glycan.yaml --verbose

Got error:

Converting ['./glycam/leaprc.GLYCAM_06j-1_trunc'] to ffxml...
Preprocessing the leaprc for GLYCAM_06j-1_trunc...
Converting to ffxml ffxml/GLYCAM_06j-1_trunc.xml...
leaprc addAtomTypes {
        { "C"   "C" "sp2" }
        { "Cg"  "C" "sp3" }
        { "Cy"  "C" "sp3" }
        { "Ck"  "C" "sp2" }
        { "CT"  "C" "sp3" }
        { "Cj"  "C" "sp2" }
        { "Cp"  "C" "sp3" }
        { "H"   "H" "sp3" }
        { "H1"  "H" "sp3" }
        { "H2"  "H" "sp3" }
        { "Ha"  "H" "sp3" }
        { "Hp"  "H" "sp3" }
        { "Hc"  "H" "sp3" }
        { "Ho"  "H" "sp3" }
        { "HW"  "H" "sp3" }
	{ "Ng"  "N" "sp2" }
        { "NT"  "N" "sp3" }
        { "N3"  "N" "sp3" }
        { "Oh"  "O" "sp3" }
        { "Os"  "O" "sp3" }
        { "O"   "O" "sp2" }
        { "O2"  "O" "sp2" }
        { "OW"  "O" "sp3" }
        { "Oy"  "O" "sp3" }
        { "S"   "S" "sp3" }
        { "Sm"  "S" "sp3" }
        { "P"   "P" "sp3" }
}
glycam_06 = loadamberparams GLYCAM_06j.dat
loadOff glycam/GLYCAM_06j-1.lib
loadOff GLYCAM_amino_06j_12SB.lib
loadOff GLYCAM_aminoct_06j_12SB.lib
loadOff GLYCAM_aminont_06j_12SB.lib
addPdbResMap {
        { 0 "OLS" "NOLS" } { 1 "OLS" "COLS" }
        { 0 "OLT" "NOLT" } { 1 "OLT" "COLT" }
        { 0 "OLP" "NOLP" } { 1 "OLP" "COLP" }
        { 0 "HYP" "NHYP" } { 1 "HYP" "CHYP" }
        { 0 "NLN" "NNLN" } { 1 "NLN" "CNLN" }
}
HOH = TP3  WAT = TP3
ionsff = loadamberparams frcmod.ionsjc_tip3p
/home/zhangi/miniconda3/envs/glycan-ff/lib/python3.9/site-packages/ParmEd-3.4.0+19.g310f15e.dirty-py3.9-linux-x86_64.egg/parmed/openmm/parameters.py:361: ParameterWarning: Some residue templates are using unavailable AtomTypes
  warnings.warn('Some residue templates are using unavailable AtomTypes',
Traceback (most recent call last):
  File "/lila/home/zhangi/choderalab/openmmforcefields/amber/convert_amber.py", line 1420, in <module>
    main()
  File "/lila/home/zhangi/choderalab/openmmforcefields/amber/convert_amber.py", line 113, in main
    convert_yaml(args.input, ffxml_dir='ffxml/')
  File "/lila/home/zhangi/choderalab/openmmforcefields/amber/convert_amber.py", line 434, in convert_yaml
    ffxml_name = convert_leaprc(files, ffxml_dir=ffxml_dir, ignore=ignore,
  File "/lila/home/zhangi/choderalab/openmmforcefields/amber/convert_amber.py", line 222, in convert_leaprc
    params.write(ffxml_name, provenance=provenance, write_unused=write_unused, improper_dihedrals_ordering='amber')
  File "/home/zhangi/miniconda3/envs/glycan-ff/lib/python3.9/site-packages/ParmEd-3.4.0+19.g310f15e.dirty-py3.9-linux-x86_64.egg/parmed/openmm/parameters.py", line 377, in write
    self._write_omm_residues(root, skip_residues, skip_duplicates,
  File "/home/zhangi/miniconda3/envs/glycan-ff/lib/python3.9/site-packages/ParmEd-3.4.0+19.g310f15e.dirty-py3.9-linux-x86_64.egg/parmed/openmm/parameters.py", line 689, in _write_omm_residues
    external_bonds = self._get_atoms_with_external_bonds(name)
  File "/home/zhangi/miniconda3/envs/glycan-ff/lib/python3.9/site-packages/ParmEd-3.4.0+19.g310f15e.dirty-py3.9-linux-x86_64.egg/parmed/openmm/parameters.py", line 645, in _get_atoms_with_external_bonds
    if d[element_name] != bonds:
KeyError: 'S'

Is this potentially related to the atom prefix issue that Peter pointed out?
This is because the hacky code I wrote to perceive the number of external bonds does not cover S. Fixing this now

@zhang-ivy
Copy link
Collaborator Author

zhang-ivy commented Jun 24, 2021

After fixing the external bonds error above, I realized that the approach we outlined in our meeting (keep only one enantiomer for each glycan residue) will not work.

When I run python convert_amber.py -i glycam/glycan.yaml --verbose after fixing the above error, I am not able to parametrize my test system (glycosylated RBD) in tleap because the glycan residues needed (NLN UYB 4YB VMB 2MA 0YB 0fA 0LB) are not all present in the GLYCAM_06j-1.lib file.

However, if I do not filter out any residues when converting the .prep to a .lib (except for CA2, which must be filtered out because it has atom type C0 which is not in ff14SB.xml, GLYCAM06j-1.xml, proteiin.ff14SB.xml, or tip3p_standard.xml), I have no issues parametrizing my system in tleap, but get the following errror when parametrizing in OpenMM:

  File "/lila/home/zhangi/choderalab/openmmforcefields/amber/convert_amber.py", line 637, in assert_energies2
    system_omm = ff.createSystem(parm_amber.topology)
  File "/home/zhangi/miniconda3/envs/glycan-ff/lib/python3.9/site-packages/simtk/openmm/app/forcefield.py", line 1177, in createSystem
    templateForResidue = self._matchAllResiduesToTemplates(data, topology, residueTemplates, ignoreExternalBonds)
  File "/home/zhangi/miniconda3/envs/glycan-ff/lib/python3.9/site-packages/simtk/openmm/app/forcefield.py", line 1359, in _matchAllResiduesToTemplates
    [template, matches] = self._getResidueTemplateMatches(res, data.bondedToAtom, ignoreExternalBonds=ignoreExternalBonds, ignoreExtraParticles=ignoreExtraParticles)
  File "/home/zhangi/miniconda3/envs/glycan-ff/lib/python3.9/site-packages/simtk/openmm/app/forcefield.py", line 977, in _getResidueTemplateMatches
    raise Exception('Multiple non-identical matching templates found for residue %d (%s): %s.' % (res.index+1, res.name, ', '.join(match[0].name for match in allMatches)))
Exception: Multiple non-identical matching templates found for residue 197 (UYB): UVA, UVB, UWA, UWB, UYA, UYB.

It seems that the UYB template actually matches many other residues that do not just differ by upper/lower case as I had initially thought, so my code for keeping only one enantiomer above is not going to work.

For now, I'll proceed with testing Peter's fix on the ffxml generated for the small subset of glycan residues ((NLN UYB 4YB VMB 2MA 0YB 0fA 0LB) in my RBD system. But in order to make a ffxml including all glycan residues, we'll need to think of a more clever approach for how to circumvent the template matching issue.

@peastman and @jchodera : Let me know if you have any thoughts on how to resolve this. If not, maybe we can discuss with the amber dev folks.

@zhang-ivy
Copy link
Collaborator Author

zhang-ivy commented Jun 24, 2021

I've just modified the truncated ffxml (with only a small subset of glycan residues) using Peter's code and tested the ffxml on a test system (glycosylated RBD):

from simtk.openmm import app

# Load the ffxml
ffxml = [ffxml/protein.ff14SB.xml', "ffxml/GLYCAM_06j-1_trunc_modified.xml"]
if isinstance(ffxml, str):
    ff = app.ForceField(ffxml)
else:
    ff = app.ForceField(*ffxml)

# Load the test pdb
pdb = app.PDBFile("../test.pdb")

# Create system
system_omm = ff.createSystem(pdb.topology)

The last line yields this error:

----------
ValueErrorTraceback (most recent call last)
<ipython-input-11-f9fd22c0d06a> in <module>
----> 1 system_omm = ff.createSystem(pdb.topology)

~/miniconda3/envs/glycan-ff/lib/python3.9/site-packages/simtk/openmm/app/forcefield.py in createSystem(self, topology, nonbondedMethod, nonbondedCutoff, constraints, rigidWater, removeCMMotion, hydrogenMass, residueTemplates, ignoreExternalBonds, switchDistance, flexibleConstraints, drudeMass, **args)
   1326 
   1327         for force in self._forces:
-> 1328             force.createForce(sys, data, nonbondedMethod, nonbondedCutoff, args)
   1329         if removeCMMotion:
   1330             sys.addForce(mm.CMMotionRemover())

~/miniconda3/envs/glycan-ff/lib/python3.9/site-packages/simtk/openmm/app/forcefield.py in createForce(self, sys, data, nonbondedMethod, nonbondedCutoff, args)
   2433         force = mm.NonbondedForce()
   2434         for atom in data.atoms:
-> 2435             values = self.params.getAtomParameters(atom, data)
   2436             force.addParticle(values[0], values[1], values[2])
   2437         force.setNonbondedMethod(methodMap[nonbondedMethod])

~/miniconda3/envs/glycan-ff/lib/python3.9/site-packages/simtk/openmm/app/forcefield.py in getAtomParameters(self, atom, data)
    926                 return result
    927             else:
--> 928                 raise ValueError('%s: No parameters defined for atom type %s' % (self.forceName, t))
    929 
    930         def getExtraParameters(self, atom, data):

ValueError: NonbondedForce: No parameters defined for atom type glycam-Cg

Here are the files necessary to reproduce the error (test.pdb and GLYCAM_06j-1_trunc_modified.xml) :
debug_peters_fix.zip

@jchodera
Copy link
Member

When I run python convert_amber.py -i glycam/glycan.yaml --verbose after fixing the above error, I am not able to parametrize my test system (glycosylated RBD) in tleap because the glycan residues needed (NLN UYB 4YB VMB 2MA 0YB 0fA 0LB) are not all present in the GLYCAM_06j-1.lib file.

It's important to remember that tleap matches by residue name, but OpenMM matches by template. It sounds like we'll need to generate a complete translation of the .prep file to .lib to feed into tleap and a pruned version of the .lib file to feed into ParmEd to generate the OpenMM ffxml. (Alternatively, we could modify ForceField to permit multiple matches, or add a mode to ParmEd to prune duplicate templates.)

@jchodera
Copy link
Member

except for CA2, which must be filtered out because it has atom type C0 which is not in ff14SB.xml, GLYCAM06j-1.xml, proteiin.ff14SB.xml, or tip3p_standard.xml

Weird! This sounds like another bug. Let's make sure to add this and the other potential bug in the .prep file to the list of issues to raise with the glycam folks.

@jchodera
Copy link
Member

It seems that the UYB template actually matches many other residues that do not just differ by upper/lower case as I had initially thought, so my code for keeping only one enantiomer above is not going to work.

OK, this suggests we will really need to add a mode to ParmEd to filter out identical residue templates on export to ffxml, or perhaps add this to @peastman's post-processing script as a short-term workaround.

@zhang-ivy
Copy link
Collaborator Author

zhang-ivy commented Jun 24, 2021

Weird! This sounds like another bug.

Actually, it seems like C0 is present as a protein atom type in peter's script. Just not sure why CA2 is present in GLYCAM_06j-1 and not in ff14SB, as CA2 seems to correspond to a Ca2+ ion, meaning it should have no parameters that are specific to the glycam forcefield. Will make a note to bring this up in the chat with the amber folks.

and the other potential bug in the .prep file to the list of issues to raise with the glycam folks.

Which other potential bug are you referring to here?

@zhang-ivy
Copy link
Collaborator Author

zhang-ivy commented Jun 24, 2021

OK, this suggests we will really need to add a mode to ParmEd to filter out identical residue templates on export to ffxml, or perhaps add this to @peastman's post-processing script as a short-term workaround.

Yes, I'm just not sure how we want to do this.
Here is the glycan naming system: http://glycam.org/docs/forcefield/glycam-naming-2/
Not sure I fully understand it, but it seems that there is a set of core monosaccharides (Table 1), where lowercase indicates L- and uppercase indicates D-. The third letter indicates the anomeric configuration (α or β).
But it seems like the naming system also allows encoding of ring form (pyranosyl or furanosyl) and occupied linkage positions (2-, 2,3-, 2,4,6-, etc.) -- not sure how these are encoded. For these two properties, given the same monosaccharide, I'm pretty sure we want there to be different parameters for different ring forms and linkage positions? In which case we would need ParmEd's residue template matcher to be able to distinguish between residues with these diff properties so that it can properly match the right residue template to a given residue?

@peastman
Copy link
Member

Just not sure why CA2 is present in GLYCAM_06j-1 and not in ff14SB, as CA2 seems to correspond to a Ca2+ ion

Ions get defined along with water, since their parameters can be different for different water models. Add it to solvent_types in my script. I only included the ions that were present in your converted XML file, but it sounds like the full force field includes more of them.

@jchodera
Copy link
Member

jchodera commented Jul 5, 2021

Woohoo!

@peastman
Copy link
Member

peastman commented Jul 5, 2021

Sounds good!

@zhang-ivy
Copy link
Collaborator Author

zhang-ivy commented Aug 9, 2021

I've finished cleaning up this PR!

I've updated the first comment in this PR with a summary of the conversion pipeline and changes required by other repos. Note that I'm still waiting on ParmEd/ParmEd#1184 to be merged.

I've also spent the afternoon modifying the validate_glyco_protein test in convert_amber.py to use the glycoprotein files that John suggested here. In order to get the tests working, I modified the files John initially sent and included them in this PR
(in amber/files/glycam/Glycoprotein_shortened*). Here are the changes I made:

  • The .parm7 and .rst7 files were generated by a .leapin file that loads oldff/leaprc.ff10 and oldff/leaprc.GLYCAM_06j_10. In order to generate .parm7 and .rst7 files using updated forcefields that will be used in practice, I updated the .leapin file to load leaprc.ff14SB and leaprc.GLYCAM_06j-1 instead.
  • The .parm7 and .rst7 files include 7 molecules, but I removed the first 2 molecules (by editing the .leapin file) because they involve zwitterionic molecules, which I wasn't sure how to parametrize in OpenMM).

The test now outputs:

Validating the conversion...
Glycosylated protein energy validation for ffxml/GLYCAM_06j-1.xml
HarmonicBondForce 166.05257416728875 166.0525741672887
HarmonicAngleForce 443.64924500308035 443.6478171109116
PeriodicTorsionForce 548.348291428228 548.3482921829175
NonbondedForce 169459044.11620137 169459044.18473148
CMMotionRemover 0.0 0.0

I should note that I've set the energy tolerance to 1e-1 so that it passes.

@jchodera @peastman : Could you please review and merge when you get a chance?

@peastman
Copy link
Member

Excellent, congratulations on getting this done!

@zhang-ivy
Copy link
Collaborator Author

@peastman Thanks so much for your help!!

@jchodera
Copy link
Member

@peastman : We should cut a new release with the glycam functionality once you've had a chance to review and merge this.

@jchodera jchodera added this to the 0.10.0 milestone Aug 28, 2021
@peastman
Copy link
Member

It looks good to me. If all the tests pass, it should be good to merge. (I notice that CI is currently failing due to lack of an OpenEye license key, but that's not related to this.)

@jchodera
Copy link
Member

(I notice that CI is currently failing due to lack of an OpenEye license key, but that's not related to this.)

Will fix!

@jchodera
Copy link
Member

jchodera commented Sep 1, 2021

@peastman : I've updated the OpenEye toolkit license and have triggered all the tests to run again.

@zhang-ivy
Copy link
Collaborator Author

@jchodera : It looks like there's still a problem with the open eye license

@jchodera
Copy link
Member

jchodera commented Sep 1, 2021

Investigating.

@jchodera
Copy link
Member

jchodera commented Sep 1, 2021

The issue here is that secrets are not disclosed on PRs opened from forks.

I'll merge this and we can debug from here.

@jchodera jchodera merged commit 65fb59e into openmm:master Sep 1, 2021
@jchodera
Copy link
Member

jchodera commented Sep 1, 2021

@zhang-ivy I've added you to this repo so you can use branches instead of pushing from forks. That should get around this issue next time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants