-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Number of molecular descriptors obtained with PaDEL differs from the number of molecules in the molecule.smi file #2
Comments
I obtained 1412 rows myself as can be seen here: https://github.com/wguesdon/beta-lactamase/blob/main/Data_Wrangling_and_EDA.ipynb. |
I just come up with the solution for this error. The mistake was that I maintain in my dataset some molecules with NaN in canonical smile feature, so padel only calculate fingerprints for molecules above the first NaN. Now, I will try to calculate the 12 fingerprints for all molecules. I hope I can calculate all of them. |
Thank you for sharing, it must have been the same issue for me. |
You're welcome @wguesdon, this is the good part of these collaborative projects :) |
Hello sayalaruano, I have the same problem. |
Hello @semsem80 , to solve this error, you need to delete molecules with NaN in canonical_smile feature. In this way, you can solve this problem. Hope this can be helpful, let me know if it works. |
Hi @sayalaruano, your suggested solution worked, thank you for your help. |
Hello professor, I’m doing EDA and calculation of molecular descriptors of the betalactamase dataset. I replaced duplicated values by the mean of them as you suggested, and filtered only molecules that bind to Betalactamase AmpC, and I have a dataset with 62050 molecules. Then, I followed instructions to calculate molecular descriptors with paDELpy from the video of description, but I obtained molecular descriptors of only 5534 molecules although my molecule.smi file has 62050 molecules. Do you know if there are restrictions regarding the number of molecules for calculating descriptors in paDEL ? or this error can be associated with something from my code ?. This GitHub repo contains my notebook and all files: https://github.com/sayalaruano/MidtermProject-MLZoomCamp. I added the same comment in the youtube video of the challenge, just in case. Thanks in advance for your help.
The text was updated successfully, but these errors were encountered: