Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

config.json file #95

Open
chenwei-zhang opened this issue Jan 24, 2024 · 3 comments
Open

config.json file #95

chenwei-zhang opened this issue Jan 24, 2024 · 3 comments
Assignees

Comments

@chenwei-zhang
Copy link

Hi, in config.json file, I don't quite understand the meaning of "crop": 6 in dict("ca_infer_args") and "crop_length": 200, "aggressive_pruning": false, in dict("gnn_infer_args":). Could you please give me some hints?
Also, if I would like to prune some short chains. how can I set up the threshold?

Thank you!

@jamaliki jamaliki self-assigned this Jan 24, 2024
@jamaliki
Copy link
Collaborator

Hi!

  • The meaning of crop can be found in model_angelo/c_alpha/inference.py. Basically, since we are doing inference on boxes of 64 voxels across, the voxels near the edge of each box do not have information about the biological context around them. So, you would expect results of these voxels for segmentation to be worse. The crop argument controls how many voxels from each side to ignore during inference for each box.
  • crop_length in the GNN is different. Since there are GPU memory constraints for the majority of our users, ModelAngelo will only look at groups of residues of length crop_length that are close together.
  • aggressive_pruning is an algorithm that prunes residues that are not found in the sequence file.

Could you clarify what you mean about pruning the short chains? Would you like to prune all chains shorted than N residues from the output CIF file? I don't believe there is such an option but it would be simple for me to write a quick python script for that if you like.

Best,
Kiarash

@chenwei-zhang
Copy link
Author

chenwei-zhang commented Jan 24, 2024

Hi Kiarash,
Thanks for your rapid reply.

  1. I I would like to generate the structures in the ModelAngelo ICLR paper. In the paper you compared pruned and unpruned predictions, I am wondering how do you do this pruning? I found a sentence in the paper saying "chains shorter than 4 residues are pruned and the resulting coordinates are used as the input". May I ask if this is corresponding to the pruned prediction? Is there any way I could customize the cutting threshold 4 residues? And if I directly use the latest version of ModelAngelo without changing any configuration, will this generate the pruned or unpruned structures?
  2. For the results you show in the paper, may I ask if you use the original map (e.g. emd_26126.map) as the input for inference directly, or you use the postprocessed map? If latter, could you give some details how you postprocess the map?

Sorry for so many questions, but ModelAngelo is an awesome work and I really appreciate. Thank you in advance for your answers.

Best,
Chenwei

@jamaliki
Copy link
Collaborator

jamaliki commented Feb 8, 2024

Hi @chenwei-zhang ,

  1. So pruned and unpruned refers to the output files. The pruned file is output.cif and the unpruned file is output_raw.cif. It is not quite clean if you want to change the cutting threshold, although I can point you to where in the code you would have to change if you like? Specifically, if you go to the class MatchToSequence, you will find the bulk of the pruning code. I'm sorry it is not very clean 😞
  2. We always used post-processed files as deposited in the EMDB. For example, for EMD-26126, it would be this file: emdb_26126.map. The only post-processing ModelAngelo will really do is to change the pixel size.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants