Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.pdb files in model output should have pLDDT scores stored in b_factor column #18

Open
kamurani opened this issue Apr 19, 2024 · 1 comment

Comments

@kamurani
Copy link

It would be convenient if the output structure files from a modelling run had the supplied pLDDT values (in .npy form) contained within the PDB files. This would be consistent with how AlphaFold PDB files are expected (e.g. when downloading from the AF database).

Happy to work on this and submit a PR.

@andreas-wilm
Copy link

Hi @kamuari,

I cobbled together the code below to add plddt to the bfactor column. This works for roughly 2/3 of my PDBs, but for the rest there is a mismatch between the number of residues and the plddt vector length, that I cannot explain. I've seen up to 10 residues difference and in all cases it's fewer residues in the pdb than the npy vector.

@phbradley would you have a hint as to how we can fix this? Thanks!

import os

from Bio.PDB import PDBParser, PDBIO
import numpy as np

def get_num_residues(structure, model_id=0):
    num_res = 0
    for model in structure:
        if model.id == model_id:
            for chain in model:
                for residue in chain:
                    num_res += 1
    return num_res


def plddt_to_pdb(pdb_in, plddt_npy, pdb_out):
    assert not os.path.exists(pdb_out), pdb_out

    parser = PDBParser()
    structure = parser.get_structure("this_id", pdb_in)
    num_residues = get_num_residues(structure)

    plddt = np.load(plddt_npy)
    assert num_residues == len(plddt), (
            f"PDB has {num_residues} residues, but plddt has {len(plddt)}")

    for model in structure:
        res_idx = 0
        for chain in model:
            for residue in chain:
                for atom in residue:
                    atom.set_bfactor(plddt[res_idx])
                res_idx += 1
        assert model.id == 0

    io = PDBIO()
    io.set_structure(structure)
    io.save(pdb_out)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants