Releases: brsynth/molecule-signature-paper
Releases · brsynth/molecule-signature-paper
Release 3.0.0
3.0.0 (2025-01-31)
Feat
- predict: predict call from CLI now always outputs something
- 2.enumeration_results: notebook to perform the molecule enumeration
- 1.enumeration_create_alphabets: notebook to create alphabets
- notebooks: add notebook
- configure: add interface for simple configuration
- predict: add interface for predicting (without evaluation)
- model: fully batch-vectorized version for beam search
- predict: test for equality of canonic SMILES
- predict: decode beam in parallel
Fix
- predict: convert python objects to strings
- predict: dataclass attribute poorly tested
- predict: remove pickle writing
- evaluate: add output file handling to write results
- evaluate: correctly deal with the max number of rows
- evaluate: include chirality in ECFP
- predict: stop crashing when beam size > vocab size
- predict: allow selection of the accelerator device
- predict: column indexes
Refactor
- predict: update default arg values
- predict: update result refinements
- configure: add default output path to None
- config: remove unused method
- imports: refine
- add empty data folder
- notebooks: merge cells
- notebooks: rename nb
- notebooks: rename notebook for fig 2
- utils: move utilities functions
- evaluate: remove file
- predict: make col names more explicit
- evaluate: sweep code
- utils: sweep code
- predict: refine outputs
- predict: remove unused args
- utils: additional shared functions
- predict: delagate results refining to subsequent code
- predict: allow calls from other script
- predict: better print
- predict: improve imports
- predict: print result to stdout on request
- predict: print default values
Release 2.1.0
2.1.0 (2025-01-07)
Feat
- prepare: outputs additional column signature_morgans
- learning: add transformer code
- dataset: add code to compute model tokens
- dataset: add code for download and prepare datasets
- transformer/train: additional arg for setting source / target max length
- transformer/train: implement gradient accumulation
- transformer/train: define num of data loader workers from args
- transformer/train: make modele compilation by Torch optional
- transformer/train: generalize mixed precision scaler usage
- transformer/model: refine state_dict Module's method
- transformer/train: check for NaNs in loss
- transformer/train: model dir output as arg
- transformer/train: experimentation with mixed precision floats
- transformer/train: make use of pin_memory=true in dataloaders expected to increase GPU perf
- transformer/train: first working version
- transformer: in dev code
- new code to download and make use of the signature code (#10)
Fix
- prepare: remove deprecated import
- get_smiles: remove superflous Hs
- prepare: sanitize molecule after stereo-isomer enumeration
- prepare: add missing header
- update changelog on version bump
- attempt to trigger GA
- main instead of master branch name
- dataset: remove unused code
- transformer/train: load_checkpoint
- transformer/train: effective batch indexes
- transformer/train: duplicated loss normalization
- transformer/train: wrong arg name
- transformer/train: take into account remaining remaining batches for the sceduler counts
- transformer/train: propagate gradient for last batches of epoch
- transformer/train: remove multiple calls to unscale_
- transformer/train: use save_checkpoint
- transformer/train: refine save and load methods
- transformer/train: correct seq length arg
- transformer/train: stop sending to preset device
- dataset/utils.py: forward pass logger in recursive calls
- tokenizer: allow additional depictions
Refactor
- remove old code
- .env: ignore local env file
- erase old code
- transformer: sweep code
- dataset: clean deprecated code
- transformer: remove deprecated code
- transformer/train: refine gradient accumulation
- transformer/config: reduce learning rate to prevent NaN / Inf values
- transformer/train: make GPU pinned memory an option
- transformer/train: add few debug messages
- transformer/config: update
- transformer/config: update
- transformer/train: get the number of epochs from config
- transformer/train: better log Nan / Inf value issues
- transformer/config: increase learning rate
- transformer/config: increase learning rate
- transformer/config: reduce learning rate
- transformer/train: update default log level
- transformer/train: better handle device arg
- transformer/config.yaml: update training values
- model: remove unecessary code
- dataset/utils.py: don't sort config keys
- download: update paths
Perf
- transformer/train: AdamW optimizer instead of Adam, OneCycleLR scheduler
Release 1.1.0
1.1.0 (2023-10-30)
Features
- download_metanetx: generate sig alphabet with nbit and neighbors (8b749d6)
- library: update to RevSig1.5 (8de9a0d)
- paper: construct alphabet for sig-nbit (866437d)
- paper: download, add emolecules (093fcfe)
- paper: download, add FP count and extract test_small (9987ee8)
- paper: download, enable formalCharge in sanitize (d4c66e3)
- paper: enable sig-nbit (4f2c125)
- paper: img, add (9469384)
- paper: img, add degenerescence (d2e6730)
- paper: tokenizer, use ECFP4_COUNT (9e31b56)
- tokenize: write SIG-NEIGH-NBIT datasets (f84dc93)
- tokenizer: increase script verbosity (0bfe1ab)
- tokenizer: new arguments to select tokenizer model, depic to treat and pairs to build (19d03e3)
- tokenizer: produce SIG-NEIGH-NBIT datasets (daf0454)
- tokenizer: refactor and enable unigram model type (45400fe)
- tokenizer: use all tokens available and support unigram model (de449e2)
Bug Fixes
- download_metanetx: fix paths (0c44e9b)
- download_metanetx: fix paths (1febe59)
- paper: dataset, ecfp4 duplicate index number according to the count (d7abd1e)
- paper: tokenizer, use the right function (73c9f21)
- signature: use ECFP instead of FCFP (97972fb)
- tokenizer: fix regular expression (7e9bcdc)
- tokenizer: spelling in AROMATIC bond regex (624d678)
- tokenizer: stop omitting bounds in regex (0398217)
- tokenizer: stop spliting SIG bond tokens (0f004cd)
Code Refactoring
- download_metanetx: print settings (3921c6c)
- download_metanetx: progress bar and more logs (14fe1fe)
- download_metanetx: store file paths in args.dict (a898d9c)
Build Systems
Documentation
Styles
Release 1.0.0
1.0.0 (2023-08-09)
⚠ BREAKING CHANGES
- tokenizer:
Features
- download: introduce default output dir (09765ec)
- library: update with "RevSig1.2" (1608cd7)
- paper: add tokenizer signature (545cfa6)
- retrosig: add utils/cmd.py file (8f398db)
- tokenizer: add sentencepiece tokenizer (3ee9f5c)
- tokenizer: build vocabularies and dataset pairs (f4ae35d)
- tokenizer: only output on-bits in ECFP4 (62f9dca)
Bug Fixes
- download: create ouput dir if it not exists (6679734)
- download: fix argparse crash due to percent sign in help (#6) (2db597e)
- download: prevent removing raw mnx file (f3c2d85)
- download: put back right path for rdkit method (#7) (1402a45)
- download: shuffle data only once (f51742f)
- tokenizer: fingerprints name in upper case to match expectation (256021a)
Build Systems
- add tox file (e12beeb)
Code Refactoring
- download: change default value of test and valid datasets (6ceb79a)
- download: disable shuffling before sanitizing (0934919)
- download: pointing out unexpected filtered smiles (6c99f96)
- download: update ouput name for the signature alphabet file (a2ef08c)
- sweep imports (0a49399)
- download: simplify args usage (90bf3f5)
- tokenizer: change file pairs extension (576139c)
Styles
- download: rename variables (bb88d89)
- download: sweep imports (de34f44)
- download: sweep imports (c6313dd)
- download: update helps of arguments (82a5afb)
- blacked files (856f0fc)