Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatically download a default checkpoint if it is not provided #48

Merged
merged 9 commits into from
Dec 15, 2023

Conversation

kmaziarz
Copy link
Contributor

This PR extends the single-step model classes to automatically download the default (trained on USPTO-50K) model checkpoint if no model directory is provided. This makes it even easier to get started with syntheseus as one does not have to manually download and unzip the files. Once downloaded, the checkpoints are cached.

On top of this, I also extend the integration CI to run a new single-step model test that checks that the supported models do run and return reasonable predictions for a simple input. While locally the test works for all 6 pip-installable models, on GitHub the GPU is not available, and setting device="cpu" uncovers that some of the models appear to only work on GPU. Thus, for now the test only covers Chemformer, LocalRetro and MEGAN; the other models will be added in the future.

When running those tests, I found that the links we provided for LocalRetro and RetroKNN were not quite correct (the initial upload missed some of the files, which I later corrected, but the links were still pointing to the first upload rather than the corrected one). Morever, I found that RetroKNN was zipped using a method that python could not deal with, so I re-generated the file and re-uploaded it (again fixing the link).

The changes are separated into reasonable commits for easier reviewing.

@kmaziarz kmaziarz force-pushed the kmaziarz/download-single-step-checkpoints branch from 6ccf7ab to 45b486d Compare December 14, 2023 18:36
Copy link
Collaborator

@AustinT AustinT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM, but I just skimmed through the code rather than actually running it, so it is very possible that I didn't catch everything.

CHANGELOG.md Show resolved Hide resolved
CHANGELOG.md Show resolved Hide resolved
@kmaziarz kmaziarz merged commit fcf0b44 into main Dec 15, 2023
4 checks passed
@kmaziarz kmaziarz deleted the kmaziarz/download-single-step-checkpoints branch December 15, 2023 13:46
kmaziarz added a commit that referenced this pull request Jan 11, 2024
PR #48 added a new CI pipeline that covers single-step model inference,
but it only tested half of the single-step models as the other ones did
not work on CPU. This PR fixes this, making all models work on CPU,
which mostly required making sure the right device is correctly
propagated into all of the model internals. I then extended the test
which now checks all `pip`-installable models we support.

Additionally, I also merged the two CI pipelines into one, as this
allows for generating a combined coverage report that takes into account
all tests. This does not affect testing speed, as the different test
jobs still run in parallel. Finally, I also noticed that the
`.coveragerc` configuration was not quite right, as the files with 0%
coverage were not being discovered; after a minor change to the
configuration this is now fixed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants