-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding MolT5 to molfeat #40
Conversation
Thanks @dessygil, I will let this sit for a while and merge when the model card is ready. |
env.yml
Outdated
@@ -36,6 +36,7 @@ dependencies: | |||
- pytorch >=1.10.2 | |||
- scikit-learn | |||
- fcd_torch | |||
- sentencepiece |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if possible can you check whether there are hidden dependencies for sentencepiece earlier version that could clash ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked the documentation on https://github.com/google/sentencepiece I didn't find anything that said there were any hidden dependencies. Is there anywhere else I can look to double-check, or is this even the right place to look? I've never considered doing this in any of my previous projects, so thank you for suggesting this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not familiar with sentencepiece
, so maybe these are naive remarks, but:
Does it make sense to add this dependency if it's only being used by a single model? Is the dependency "stand-alone" (no subdependencies) and thus easy to install? Is it popular enough that we expect other models to use it too? Is that why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use conda-forge
, so having a quick look at the recipe for sentencepiece is enough. Here: https://github.com/conda-forge/sentencepiece-feedstock/blob/main/recipe/meta.yaml
It's always worth checking how any new dependency works or conflict with existing ones. For example a new dependencies might require a version of package with could be incompatible with an existing dependency.
I don't see anything that would be a problem in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I posted my comment before seeing yours @cwognum.
The way I would approach it is to add the optional dependency on pyproject.toml
for pip
and in the env file here. Then document the requirement to make the model work.
In the conda recipe we don't need to add that dependency, as users will install that themselves.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @dessygil , please see my comments
modelCard.yml
Outdated
@@ -0,0 +1,20 @@ | |||
- name: laituan245/molt5-large-smiles2caption |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please provide this information in the body of your PR, not as a file next time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reference is supposed to be a single string, so please provide link to the paper or a DOI directly, if there isn't one, then the github repo or any other link would be fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please provide this information in the body of your PR, not as a file next time.
I can delete this file and then edit the PR. Would this be better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, thanks @dessygil !
modelCard.yml
Outdated
- huggingface | ||
- transformers | ||
- text2text | ||
authors: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please put all the authors of the original paper: https://arxiv.org/abs/2204.11817
@@ -387,6 +387,7 @@ def _embed(self, inputs, **kwargs): | |||
|
|||
Args: | |||
inputs: smiles or seqs | |||
use_encoder: If model requires encoderfeaturi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove this, and put that argument in the init.
@@ -397,7 +398,10 @@ def _embed(self, inputs, **kwargs): | |||
else: | |||
attention_mask = None | |||
with torch.no_grad(): | |||
out_dict = self.featurizer.model(output_hidden_states=True, **inputs) | |||
if hasattr(self.featurizer.model, 'encoder'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use the use_encoder
if the model is a huggingface EncoderDecoder
model to check whether encoder
should be used. Otherwise default to what was done before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I forgot to delete the use_encoder, but I figured the hasattr() might be better because it allows you to avoid adding another init parameter. Is using init better in this scenario?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's fine the way you proposed if you delete the use_encoder
, I wanted to make it more general to support most cases of EncoderDecoder
models.
plugin.yaml
Outdated
entry_point_prefix: new | ||
home_url: ~ | ||
molfeat_version: ~ | ||
molfeat-MolT5: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a direct PR in molfeat, so no need to fill this since it's not a plugin.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I deleted this file
Checklist:
I have spoken to Manu about different plugins I would be able to add. I chose to start with MolT5. The PR wasn't discussed in an issue but through the discussion tab and Slack. The rest seems okay to checkoff since I'm just preregistering what I want to add.
news
entry.news/TEMPLATE.rst
tonews/my-feature-or-branch.rst
) and edit it.