-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error fix: Specify "standard_names=False" when loading pdb into mdtraj #552
base: main
Are you sure you want to change the base?
Conversation
Could you elaborate on this a little on what's going on here (what triggers it, how often it happens, etc.)? I'm a little apprehensive about changing the default behavior of one of the main PDB loading pathways |
@mattwthompson I can appreciate your apprehension. I haven't encountered any unintended consequences of making the change this PR suggests, but that doesn't mean there are provably none. This issue has only started to occur for me with PDBs describing host-guest systems unique to the "steroids" branch of the TAPROOM dataset (meaning not found in the main branch). @jeff231li mentioned the error given seemed familiar, so he may have encountered it before, but it seems to be an overall rare issue. Here is a little script showing two example PDBs, one that does not have this issue and one that does. The PDB that does have this issue, "hse_x02", can be corrected when import mdtraj
import requests
urls = [
"https://raw.githubusercontent.com/slochower/host-guest-benchmarks/steroids/taproom/systems/bcd/hex/b-hex-p.pdb",
"https://raw.githubusercontent.com/slochower/host-guest-benchmarks/steroids/taproom/systems/hse/x02/b-x02-p.pdb",
]
system_names = ["bcd_hex", "hse_x02"]
def has_CONECT_issue(url, system_name):
raw_pdb = requests.get(url)
with open(f"{system_name}.pdb", "wb") as f:
f.write(raw_pdb.content)
mdtraj_system = mdtraj.load_pdb(f"{system_name}.pdb") # Problem line
# Here the mdtraj trajectories are sliced to only contain the atoms of the host molecule, identified by a list of indices.
# The list of indices determined by pAPRika is 0..146 for bcd and 0..209 for hse
if system_name == "bcd_hex":
mdtraj_system_sliced = mdtraj_system.atom_slice(range(0, 147))
elif system_name == "hse_x02":
mdtraj_system_sliced = mdtraj_system.atom_slice(range(0, 210))
mdtraj_system_sliced.save(f"{system_name}-mdtraj.pdb")
with open(f"{system_name}-mdtraj.pdb", "r") as f:
for line in f.readlines():
if "CONECT" in line:
return False
return True
for i, url in enumerate(urls):
print(
f"System {system_names[i]} has the CONECT issue: {has_CONECT_issue(url, system_names[i])}\n"
) I ran this script with mdtraj 1.9.9 (py311h90fe790_1). Looking at the PDB files produced by running the script will allow you to directly see the difference between raw and mdtraj-processed CONECT records. I think the script highlights the issue sufficiently, but if it's unclear please let me know. |
Here is a list of the major differences between the
My guess to why the default mdtraj loading works for |
Sorry, I'm still a little confused - MDTraj can load those files in just fine, but gets confused when writing the CONECT records of the HSE host? Any of the issues you see in the HSE file seem like data problems to me - why, for example, should one have a TER separating the host and guest and the others not? Does pAPRika do alchemical transformations that muck with the bond graph? I'm iffy on this as-is since these seem like input data issues, but if there was a try-except (try loading it as-is, and if that fails fall back to using this flag) I'd be more amenable to it. |
Description
This change fixes an error where, for certain host PDBs, pAPRika can lose CONECT bond information during the
release_apply_parameters
windows of APR calculations. The error is given here (some line numbers will differ):Todos
protocols/paprika/coordinates.py
, to prevent an error.Note
The file
protocols/paprika/coordinates.py
contains one other instance where an MDTraj trajectory is created usingmdtraj.load_pdb()
, located inside_atom_indices_by_role()
, but I haven't encountered any errors related to that instance.