Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KeyError: 'XcXgXyXtXyXgXd #59

Open
1412140736 opened this issue Sep 7, 2024 · 1 comment
Open

KeyError: 'XcXgXyXtXyXgXd #59

1412140736 opened this issue Sep 7, 2024 · 1 comment

Comments

@1412140736
Copy link

1412140736 commented Sep 7, 2024

I encountered an error while using this method: KeyError: 'XcXgXyXtXyXgXd'. So, I checked the result returned by the get_struc_seq method. The seq, which should store the original amino acid sequence, is different from the result directly read from the PDB file.
This is my code:
`
from Bio import PDB
from utils.foldseek_util import get_struc_seq
def get_chain_sequence(pdb_file, chain_id):
parser = PDB.PDBParser(QUIET=True)
structure = parser.get_structure('structure', pdb_file)
model = structure[0]

chain = model[chain_id]

ppb = PDB.PPBuilder()
peptides = ppb.build_peptides(chain)

sequence_str = ''.join([str(peptide.get_sequence()) for peptide in peptides])

return sequence_str

pdb_file = '../example/5jqb.pdb'
chain_id = 'A'
sequence = get_chain_sequence(pdb_file, chain_id)

parsed_seqs = get_struc_seq("../bin/foldseek", '../example/5jqb.pdb', ["A"], plddt_mask=False)["A"]
seq, foldseek_seq, combined_seq = parsed_seqs
print(f"Chain {chain_id} sequence: {sequence}")
print("seq:",seq)
print("combined_seq:",combined_seq)
`

Here is the output of the code execution.
Command: ../bin/foldseek structureto3didescriptor -v 0 --threads 1 --chain-name-mode 1 ../example/5jqb.pdb get_struc_seq_0_1725676715.414907.tsv
stdout:
Chain A sequence: SIPLGVIHNSALQVSDVDKLVCRDKLSSTNQLRSVGLNLEGNGVATDVPSATKRWGFRSGVPPKVVNYEAGEWAENCYNLEIKKPDGSECLPAAPDGIRGFPRCRYVHKVSGTGPCAGDFAFHKEGAFFLYDRLASTVIYRGTTFAEGVVAFLILPQAKKDFFSGYYSTTIRYQATGFGTNETEYLFEVDNLTYVQLESRFTPQFLLQLNETIYTSGKRSNTTGKLIWKVNPEIDTTEWAFWETLSFTVV
seq: SIPLGVIHNSALQVSDVDKLVCRDKLSSTNQLRSVGLNLEGNGVATDVPSATKRWGFRSGVPPKVVNYEAGEWAENCYNLEIKKPDGSECLPAAPDGIRGFPRCRYVHKVSGTGPCAGDFAFHKEGAFFLYDRLASTVIYRGTTFAEGVVAFLILPQAKKDFFSGYYSTTIRYQATGFGTNETEYLFEVDNLTYVQLESRFTPQFLLQLNETIYTSGKRSNTTGKLIWKVNPEIDTTEWAFWETLSFTVVXXXXXXX
combined_seq: SdIaPwLaGwVeIdHdNpSqAdLiQdVtSdDdVpDvKpLdVdCpRvDdKdLdSpSdTcNvQlLkRfSkVeGkLeNfLvElGvNvGqVqAqTqDfVpPvSrAvTlKlRqWkGaFaRaSaGdVdPdPkKdVkVdNfYgEdAyGyEdWaAeEaNeCkYeNaLeEfIeKdKePpDvGrShEgClLfPaAaAdPdDpGpIfRaGaFdPdRhCyRqYeVyHeKyVeSyGeTyGaPpCnApGhDnFfAmFaHgKnEvGqAwFwFwLdYtDhRrLmAtSmTrVtIdYhRhGgThTiFtAgEtGtVhVmAhFmLyIrLhPdQpAdKdKrDhFdFdSdGdYgYhSyTdTyIwRyYkQyAwTyGhFrGrTdNpEdTiEwYiLwFtEdVlDdNpLqTeYtVeQgLdEdSsRqFaTdPpQvFnLsLvQvLvNsEvTcIcYvTvSvGvKvRgSdNpTdTpGhKhLpIyWeKyVeNdPpEpIdDgThTdEsWdArFpWvEpTdLaSaFwTdVaVpXcXgXyXtXyXgXd

@LTEnjoy
Copy link
Contributor

LTEnjoy commented Sep 7, 2024

Hi,

Both amino acid sequence and foldseek sequence are obtained by using foldseek binary file. Sometimes foldseek parses extra amino acids given a pdb file. Maybe you could check the pdb file for more details?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants