Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mismatch in Arguments for eps_net and Issues with Test Results Reproduction #4

Closed
BL-Lac149597870 opened this issue Sep 21, 2024 · 11 comments

Comments

@BL-Lac149597870
Copy link

Hello,

I encountered an issue with the function eps_net during the inference process here L274-L277. It appears that the number of arguments expected by the forward() method of eps_net does not match the number passed in the code.

Upon inspecting the code, I found that the argument R_t_global was not being passed. I tried to fix this by adding the following line, mimicking the training part of the code:

_, R_t_global = global_frame(X_t, mask_gen_pos)

and then passed the variable R_t_global to the eps_net as:

vp_t, vr_t, vd_t, vc_t = self.eps_net(
                    d_t, s_t, X_t, R_t, R_t_global, res_feat, pair_feat, t_tensor, 
                    mask_gen_d, mask_gen_aa, mask_gen_pos, mask_res
                )   # (N, L, 3), (N, L, 3, 3), (N, L, 3)

As a result, the sample function looks like this:

X_t = manifold_to_euclid(r_t, p_t, d_t, X_ctx, mask_gen_pos)

# WARN: fix missed R_t_global
_, R_t_global = global_frame(X_t, mask_gen_pos)
# ===========================

X_t, R_t = X_t[:, :, BBHeavyAtom.CA], construct_3d_basis(X_t[:, :, BBHeavyAtom.CA],
                                                         X_t[:, :, BBHeavyAtom.C],
                                                         X_t[:, :, BBHeavyAtom.N],)

vp_t, vr_t, vd_t, vc_t = self.eps_net(
        d_t, s_t, X_t, R_t, R_t_global, res_feat, pair_feat, t_tensor, 
        mask_gen_d, mask_gen_aa, mask_gen_pos, mask_res
    )   # (N, L, 3), (N, L, 3, 3), (N, L, 3)         

1、Could you please confirm if the changes I made to the code are correct?


I then used this fix to generate peptides with the test dataset using pretrained weights from the ppflow_pretrained.pt ckpt and evaluated the generated peptides(bb4.pdb format) with evaluation/eval_struct_seq.py. However, the results I obtained did not correspond with the results presented in your paper.

Model ΔG(↓) IMP%-B(↑) IMP%-S(↑) Validity(↑) Novelty(↑) Diversity
PPFLOW-BB -349.59 36.02% 10.34% 1.00 0.84 0.76
reproduction - - 0.04% 1.00 0.97 0.64

If it helps, corresponding raw files and evaluation meta files can be downloaded here.

2、Could there be any additional minor errors in the code that are preventing the results from being accurately reproduced?

I really appreciate your help in resolving this issue. Thank you for your continued support and dedication to improving this project!

@EDAPINENUT
Copy link
Owner

Thank you for raising the issue. We are working on resolving it. Due to a recent code reorganization, the model was retrained, and we have noticed many discrepancies compared to the initial model. Additionally, there have been occurrences of NaN during training. We will conduct a thorough review and verify the samples from the initial dataset.

@EDAPINENUT
Copy link
Owner

Besides, thanks a lot for your interest in our work. Please feel free to send an e-mail to me for a detailed discussion (linhaitao@westlake.edu.cn), or directly add my WeChat.

@EDAPINENUT
Copy link
Owner

EDAPINENUT commented Sep 23, 2024

I have uploaded my previously generated peptides, and evaluated them again. (100/structure)
The evaluated results are

IMP-S: 12.50%
Validity: 1.00
Nov: 0.99
div: 0.92

You can download codesign_results.tar.gz from the google drive.

Besides, we use _bb3.pdb for evaluation because, in comparison with diffpp, it just generates 3 backbone atoms. You can filter the file name ending with _bb3.pdb and try the evaluation again.

We will also check the previous checkpoints and upload it again.

@BL-Lac149597870
Copy link
Author

Thank you very much for providing the original peptide files! Upon careful review, I've encountered several issues:

  • Validity Evaluation Code Bug: The code used to evaluate validity contains a bug; it should utilize logical_or instead of logical_and here.

  • Peptide Validity drops to zero after Correction: Following the correction of the validity calculation, it appears that almost all the provided peptides are invalid due to the amino acid atoms being too far apart. This condition leads to anomalies(fake higher IMP_S) during the FoldX energy calculations. It seems that FoldX is unable to provide reasonable scores for fragmented peptides. I suspect that the peptide coordinates might have been altered, given that peptides designed by your program typically should not exhibit invalid structures.

  • Reevaluation of FoldX Energy: I reevaluated the FoldX energy using the suggested bb3.pdb files, which are peptides newly sampled using your trained model and sampling codes (ppflow-bb mode). I then recalculated the IMP_S metric, unfortunately finding that it still remains at a low 0.04%.

Thank you again for your assistance with this research! I really appreciate your assistance in resolving this issue.

@EDAPINENUT
Copy link
Owner

We have identified this strange phenomenon and are committed to resolving it. Since our reconstruct algorithm is based on angles, the bond lengths should theoretically remain mostly fixed. However, there are indeed some issues in the uploaded samples. We will resolve this shortly and provide a response soon.

1 similar comment
@EDAPINENUT
Copy link
Owner

We have identified this strange phenomenon and are committed to resolving it. Since our reconstruct algorithm is based on angles, the bond lengths should theoretically remain mostly fixed. However, there are indeed some issues in the uploaded samples. We will resolve this shortly and provide a response soon.

@BL-Lac149597870
Copy link
Author

Could you please provide any updates on the resolution of these issues or any new findings that have emerged? Your prompt response will be greatly appreciated as it will help us proceed accordingly.

@EDAPINENUT
Copy link
Owner

Plz be patient. We are still working on it, retraining our model. Since the evaluation code is mistaken with bugs, the new results of PPFlow and DiffPP will be updated, and it will be uploaded through the google drive once the evaluation is finished.

We will inform you of the latest news of it, and then you can download them for further evaluation. Thanks.

@lalalalala-ai
Copy link

Dear sir, it is been another month and I want to follow up with your training on the model. I don't think the current published model weights could be usable. when do you expect this to do done?

@EDAPINENUT
Copy link
Owner

We have identified issues in previous experiments and are working on resolving them. However, due to computational limitations and other manuscript commitments, retraining the model will take some time, and relevant permissions and user licenses are also required for the retrained model from the computational resource providers. We anticipate providing new generated samples by the end of the year, and the retrained model will be resubmitted or made available externally through the relevant pharmaceutical platform. Thank you for your patience.

@EDAPINENUT
Copy link
Owner

EDAPINENUT commented Dec 9, 2024

Apologies for the delayed response. After identifying the issue with the logical operator that led to overestimated results in xFold tests, we have spent the past few months recalibrating and rerunning the baselines. During this process, we found that the original version of PPFlow indeed exhibited suboptimal performance in foldX validation. To address this, we have implemented the following optimizations to both the model and the data:

  • We optimized the orientation and translation Flow matching using the method described in Sec. 4.1: Diffusion Conditional Vector Fields from FlowMatching, and uploaded the new version model in ./ppflow.
  • We performed thorough data cleaning and refinement. This was necessary because many of the original data samples showed extreme instability in their foldX validation conformations, whereas the PPDBench dataset exhibited much greater stability. This discrepancy made it difficult for generative models trained on the original dataset to achieve robust performance on benchmark datasets.

Given the critical importance of data preprocessing, we are currently in discussions with collaborators regarding the timeline for open-sourcing the new model checkpoints. We plan to upload the trained models to our lab’s platform in the near future.

We provide the peptides as codesign_results.tar.gz from our google drive, which consists of 20 samples per protein structure for more stable evaluation. Below are the results of the new model and the DiffPP benchmark under the updated data and experimental setup:

|Method IMP%-S(↑) Validity(↑) Novelty(↑) Diversity
PPFlow 4.04% 1.00 0.99 0.67
DiffPP 3.72% 0.41 0.89 0.28
  • While overall performance has declined across the board, the new PPFlow still outperforms DiffPP to some extent. Notably, after the ADCP-redocking step, the baseline performance has remained relatively unchanged.

Furthermore, we deeply regret our oversight. We have already updated the reported values in the new version of the paper on arXiv) (it requires several days to successfully update the version.) and are actively communicating with the conference organizers to submit a corrected manuscript addressing the issue of overestimated baseline metrics.

We have also added a Note to the README file of our repository, acknowledging the issue with the initial version of the paper.

We sincerely appreciate @BL-Lac149597870 valuable feedback in identifying these shortcomings in our codebase and implementation. If there are any further questions or concerns, please don’t hesitate to reach out.

Thank you for your support and understanding.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants