-
-
Notifications
You must be signed in to change notification settings - Fork 151
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🐛 Bug: Invalid SMILES producing a Null output #941
Comments
Hi @Richiio I need to understand this better. Can you please provide an example of the input and output you are getting in the correct and incorrect case? And better explain this: The current model implementation encounters errors when provided with incorrect SMILES input. This can lead to unexpected behavior and inaccurate predictions. which errors does it encounter? what inaccurate predictions? we should not output a prediction if the input is not correct |
Hi @GemmaTuron, that was a wrong description on my part. Apologies for that |
Hi @Richiio thanks for the issue. Can you provide run, in the same input file one good molecule and one bad molecule and attach the output here, as a csv? |
@miquelduranfrigola |
OK, thanks @Richiio ! Super useful. Let's discuss this with @GemmaTuron and @DhanshreeA , not sure what is the best format we would want. I agree that the current solution, in this case, is not the best. This is probably something we should do discuss in an online meeting. |
@GemmaTuron what is the current status of this? |
We can add it to the agenda for discussion on Tuesday! |
Also, can we please add a title to this issue to help all of us keep track of it? Thanks! |
I am labeling this low priority since it would be nice to have model output from ersilia cli in a more meaningful format in case of invalid smiles/garbage input, however it is not a breaking issue as of now. We will take this up soon! |
Hi @miquelduranfrigola and @DhanshreeA I thought Ersilia dealt with bad inputs from the start, so what do we want to do here? A more informative return of info for the user? |
Agree. What is more important is that we always return the same number of output rows as input rows. |
I would suggest a null key could be helpful so users can quickly identify if a molecule is not correct |
@miquelduranfrigola agree, we could have a placeholder/dummy key for cases where we could not obtain a key for a molecule, something like |
I would simply call it |
@miquelduranfrigola or @DhanshreeA could you detail a bit how to tackle this? In which section of the Ersilia code should this go? I suggest using |
In my opinion, this logic should be incorporated in the Note that we've found the interesting (a.k.a. annoying) case where the SMILES is valid but it is not possible to produce an InChIKey. I do not recall which SMILES string was that. This is not frequent (first time I see it), but we need to take it into account. |
@miquelduranfrigola this only happens when apparently, RDKit is not able to process this SMILE, and neither PubChem nor Cactus resolvers return anything. |
OK then this is something relatively easy, then |
Alright, so for conclusion, we will use To test this, we should feed ersilia first with a single unprocessable input, and then a couple of unprocessable inputs within otherwise meaningful data. |
Agree. Also, let's document this accordingly. Thanks! |
I think this could also make for a good first issue. |
Hello @DhanshreeA i would like to give this a shot. |
@DhanshreeA hahaa i know i be tagged you in so many places but this looks pretty good too |
@musasizivictoria go ahead! |
I don't see any updates from @musasizivictoria so I am reopening this issue for other applicants. |
Ooh am sorry @DhanshreeA i did not receive a notificcation for the assignment, kindly reassign me this to kick start Thanks. |
Okay, @musasizivictoria you can continue working on it but please share updates. Thank you. |
Thanks @DhanshreeA I already opened PR Kindly let me know another way of sharing update, sorry if i missed it. Do i need to post in slack? |
No here is fine, thanks @musasizivictoria! |
I am progressing on the implementation, but i have some doubt about how the ersilia/ersilia/io/types/compound.py Line 26 in 43bbe1d
|
@musasizivictoria honestly, I think we can change this part and import the |
Thanks @DhanshreeA here is the testing of the current changes: Output: |
hello @DhanshreeA , |
Is your feature request related to a problem? Please describe.
Yes, it is related to a problem. I incorporated a model recently that worked perfectly well for a SMILE with correct input but failed for a SMILES with an incorrect input. Here is a link to the model incorporated https://github.com/ersilia-os/eos1mxi
Describe the problem:
The current model implementation encounters a null output when provided with an inaccurate SMILE.
Describe the solution you'd like.
I am looking at a possible solution where the ersilia model hub checks the input of a user, ensures it is a correct SMILES before attempting to run predictions with the model. This could be part of the checks ran in the github actions ensuring that only valid SMILES are processed by the model.
Describe alternatives you've considered
While implementing input validation in the GitHub Actions workflow is one solution, alternative approaches may include incorporating input validation directly within the model's code or providing a separate input validation endpoint that users can query before submitting SMILES for predictions (This would be for the web UI)
The text was updated successfully, but these errors were encountered: