Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding MolT5 to molfeat #40

Closed
wants to merge 0 commits into from
Closed

Adding MolT5 to molfeat #40

wants to merge 0 commits into from

Conversation

dessygil
Copy link
Contributor

@dessygil dessygil commented Apr 22, 2023

Checklist:

I have spoken to Manu about different plugins I would be able to add. I chose to start with MolT5. The PR wasn't discussed in an issue but through the discussion tab and Slack. The rest seems okay to checkoff since I'm just preregistering what I want to add.

name: MoltT5
inputs: smiles
type: pretrained
group: huggingface
version: 0
submitter: Desmond Gilmour
description: "MolT5 specifically Smiles2Caption is a pretrained model based on the Colossal Clean Crawl Corpus for textual descriptions, ZINC-15 for SMILES strings, and fine tuned using ChEBI-20."
representation: text
require_3D: false
tags:
    - smiles
    - huggingface
    - transformers
    - text2text
authors:
    - Tuan Manh Lai
    - Carl Edwards
    - Kevin Ros
    - Garret Honke
    - Kyunghyun Cho
    - Heng Ji
reference: 
    - https://arxiv.org/pdf/2204.11817.pdf
  • Was this PR discussed in a issue? It is recommended to first discuss a new feature and let the community know whether you are planning or have started working on it before opening a PR.
  • Add tests to cover the fixed bug(s) or the new introduced feature(s) (if appropriate).
  • Update the API documentation if a new function is added or an existing one is deleted.
  • Added a news entry.
    • copy news/TEMPLATE.rst to news/my-feature-or-branch.rst) and edit it.

@maclandrol
Copy link
Member

Thanks @dessygil, I will let this sit for a while and merge when the model card is ready.

@maclandrol maclandrol changed the title Preregistering the plugin I would like to add Adding MolT5 to molfeat Apr 27, 2023
@maclandrol maclandrol requested a review from cwognum April 27, 2023 18:44
env.yml Outdated
@@ -36,6 +36,7 @@ dependencies:
- pytorch >=1.10.2
- scikit-learn
- fcd_torch
- sentencepiece
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if possible can you check whether there are hidden dependencies for sentencepiece earlier version that could clash ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked the documentation on https://github.com/google/sentencepiece I didn't find anything that said there were any hidden dependencies. Is there anywhere else I can look to double-check, or is this even the right place to look? I've never considered doing this in any of my previous projects, so thank you for suggesting this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not familiar with sentencepiece, so maybe these are naive remarks, but:

Does it make sense to add this dependency if it's only being used by a single model? Is the dependency "stand-alone" (no subdependencies) and thus easy to install? Is it popular enough that we expect other models to use it too? Is that why?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use conda-forge, so having a quick look at the recipe for sentencepiece is enough. Here: https://github.com/conda-forge/sentencepiece-feedstock/blob/main/recipe/meta.yaml

It's always worth checking how any new dependency works or conflict with existing ones. For example a new dependencies might require a version of package with could be incompatible with an existing dependency.

I don't see anything that would be a problem in this case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I posted my comment before seeing yours @cwognum.

The way I would approach it is to add the optional dependency on pyproject.toml for pip and in the env file here. Then document the requirement to make the model work.

In the conda recipe we don't need to add that dependency, as users will install that themselves.

Copy link
Member

@maclandrol maclandrol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @dessygil , please see my comments

modelCard.yml Outdated
@@ -0,0 +1,20 @@
- name: laituan245/molt5-large-smiles2caption
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please provide this information in the body of your PR, not as a file next time.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reference is supposed to be a single string, so please provide link to the paper or a DOI directly, if there isn't one, then the github repo or any other link would be fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please provide this information in the body of your PR, not as a file next time.

I can delete this file and then edit the PR. Would this be better?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, thanks @dessygil !

modelCard.yml Outdated Show resolved Hide resolved
modelCard.yml Outdated
- huggingface
- transformers
- text2text
authors:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please put all the authors of the original paper: https://arxiv.org/abs/2204.11817

@@ -387,6 +387,7 @@ def _embed(self, inputs, **kwargs):

Args:
inputs: smiles or seqs
use_encoder: If model requires encoderfeaturi
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this, and put that argument in the init.

@@ -397,7 +398,10 @@ def _embed(self, inputs, **kwargs):
else:
attention_mask = None
with torch.no_grad():
out_dict = self.featurizer.model(output_hidden_states=True, **inputs)
if hasattr(self.featurizer.model, 'encoder'):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the use_encoder if the model is a huggingface EncoderDecoder model to check whether encoder should be used. Otherwise default to what was done before.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot to delete the use_encoder, but I figured the hasattr() might be better because it allows you to avoid adding another init parameter. Is using init better in this scenario?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's fine the way you proposed if you delete the use_encoder, I wanted to make it more general to support most cases of EncoderDecoder models.

plugin.yaml Outdated
entry_point_prefix: new
home_url: ~
molfeat_version: ~
molfeat-MolT5:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a direct PR in molfeat, so no need to fill this since it's not a plugin.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I deleted this file

@maclandrol
Copy link
Member

@dessygil Thanks again, I made some additional changes, so I am opening a new PR. where your work was merged in. See #47

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants