-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Verify that altloc labels (in PDB) are supported across XCA, loader, & F/E #1512
Comments
@mwinokan do you have the tarball? |
@ConorFWild says that the alt conf is currently not identified as a separate ligand instance. The bottom line is that there are ligand models/observations are missing from the f/e, the question is whether XCA should split the two models or if we suggest a different SoP for modelling. Conor is anxious that this will affect all ligand instances to need to consider the alternative site/conformations. Conor says it is trivial to split the ligand models, but difficult to account for it across the whole pipeline. Currently, Jasmin is concerned that 2343 for CHIKV_Mac shows the less desirable ligand model. Conor says that the first conformation is used and letter is ignored, and that only the residue numbers are used to split the ligands and not the name/altconf. One workaround is to model it as a partial occupancy separate residue, a script could be made for this that runs before XCA, or this could be a functionality for the collator @tdudgeon says that if the ligand has different names it will require #1476, but we can assume here that all ligands are named |
@tdudgeon has implemented the validation in the collator that raises an error if there are multiple models for the same ligand (same residue number) |
@tdudgeon this python snippet should be all you need for the collator:
|
@tdudgeon has previously said that the combi-soak data was a prerequite as the ligand names would be unknown otherwise (in the metadata). @tdudgeon now thinks that there is no way for alternative conformations to be valid as there should never be two ligand models within the same residue number. @phraenquex says this may be necessary to correctly tell crystallographic software that there can be multiple residue models in the same place |
@phraenquex says that the altlocs and residue numbers needs to be carried through to all the aligned files as they are in the crystallographic files, but observations must be split by the collator. @phraenquex says to aggregate all residues with no alt site and "A" together as one PDB and the same for "B", etc. if present in the crystallographic file This may need some brainstorming between @tdudgeon, @ConorFWild, and @mwinokan to find the most elegant way to do this. |
Thanks for your investigation efforts @tdudgeon. At the end of the day we want each conformation of the ligand to be a separate site-observation in the database, and separate observation in the f/e. Hence it should be split by the collator and aligned separately, right? |
If that is indeed the case then there is a lot of work that's needed in XCA to handle this. 2 molecules, same site => 2 observations |
@phraenquex confirms that each alt conf of a ligand should be its own observation. @tdudgeon says this will require significant changes to the collator but not aligner. It is still unclear what changes will be needed for the loader, but there will likely be a need to carry through more metadata to trace the root ligand in the crystallographic data. Additionally, Daren pointed out that there might not be categorical correlation between altloc letters across the structure, but since we are working locally @phraenquex says that we should assume that all the protein A locs go with the ligand A, for example. @phraenquex says to ideally aggregate all nearby protein alt locs near the ligand with the same letter for alignment, but practically speaking it may not be necessary as LNA only looks at the nearby |
@tdudgeon has progressed this and says that there may be no changes needed to the collator/aligner.
But @phraenquex says that his may affect the alignment as in extreme cases the protein may have shifted dramatically and it is indeed the collator that will need to split the PDBs into separate files to be aligned by the aligner. There may be a need to review whether the aligner can accept these split files with @ConorFWild @tdudgeon asks if we should correlate all the altlocs across the whole structure together, or if the splitting should occur within some radius of the ligand. @phraenquex says that it is easier to treat them all together, and that should be implemented this way for now Altloc codes that are not present for the ligand, e.g. if the ligand only has A,B but the protein has A,B,C,D:
|
I have initial code that is able to split out the altlocs into separate PDB files, but doing so hit a conceptual problem.
Here residues 147 and 201 are the same molecule at 2 different locations, one of which (201) has an altloc. |
@phraenquex agrees that this is more elegantly handled in LNA. @tdudgeon please see if you can make sense of how this is handled in LNA and see what kind of changes would be necessary, for either yourself or Conor to implement @ConorFWild says there are multiple places/ways to implement this
Conor says that dealing with protein altconfs will need much more serious work in both LNA and XCA. actions: @tdudgeon please prepare a minimal dataset for Conor and begin modifying |
The molecule extraction has been modified to handle altlocs for ligands. The following are now generated for each observation:
As the FE already displays the molfile, it should now display multiple altlocs without the need for any changes. This change is rolled out the the XCA staging environment. |
@mwinokan to help with datasets to test this. Also need to confirm if the .sdf file generated is the download source vs. the mol file that is converted into the .sdfs files downloaded. |
Noticed centroid residue information is missing from 2A example. @kaliif says the centroid info is in the API endpoint. @matej-vavrek can you please have a look at this and confirm what is happening with the centroid API call? |
I looked at the x0515 data again and it does seem to be displayed correctly in fragalysis, except that the multiple ligand conformations are not shown, which is probably expected.
The second is probably better as it leaves the possibility of showing/hiding individual conformations in the future. @kaliif may still need to make a small change to the backend to ensure that the SDF in the upload is served up rather than one he generates from the molfile (which no longer needs to be done). |
This will need work from @boriskovar-m2ms as he is familiar with the NGL implementation, after he is done with #1483 @tdudgeon please share in this ticket a screenshot (annotated?) of the NGL UI to help Boris find exactly which settings need changing |
E.g.
A71EV2A
crystal0515
has several ligands modelled at residue index201
. In the crystallographic PDB files they have the alternative site codesA
andB
to differentiate between the two models.The text was updated successfully, but these errors were encountered: