Skip to content

Signature based molecule enumeration from morgan fingerprints

License

Notifications You must be signed in to change notification settings

brsynth/molecule-signature

Repository files navigation

Molecule Signature

Github Version Github Licence Coveralls Conda Downloads

Signature-based enumeration of molecules from morgan fingerprints.

Table of Contents

Installation

From conda package

Installation using conda is the easiest way to get started. First, install Conda and then install the package from the conda-forge channel.

  1. Install Conda: Download the installer for your operating system from the Conda Installation page. Follow the instructions on the page to install Conda. For example, on Windows, you would download the installer and run it. On macOS and Linux, you might use a command like:

    bash ~/Downloads/Miniconda3-latest-Linux-x86_64.sh

    Follow the prompts on the installer to complete the installation.

  2. Install signature from conda-forge:

    conda install -c conda-forge signature

From source code

One can also install the tool from the source code. This method is useful for development purposes.

  1. Install dependencies:

    conda env create -f environment.yml
  2. Add the signature to conda:

    conda activate sig
    pip install -e .  # From the root of the repository
  3. Add development dependencies:

    conda activate sig
    conda env update -n sig -f environment-dev.yml

Usage

Build a signature from SMILES

  • From Python

    Below a simple example showing how to build a signature from a SMILES string. For more example, one can refer to the signature-basics notebook.

    from rdkit import Chem
    from molsig.Signature import MoleculeSignature
    
    mol = Chem.MolFromSmiles("CCO")
    mol_sig = MoleculeSignature(mol)
    mol_sig.to_list()
    # [
    #  '80-1410 ## [C;H3;h3;D1;X4]-[C;H2;h2;D2;X4:1]-[O;H1;h1;D1;X2]',
    #  '807-222 ## [C;H3;h3;D1;X4]-[C;H2;h2;D2;X4]-[O;H1;h1;D1;X2:1]',
    #  '1057-294 ## [O;H1;h1;D1;X2]-[C;H2;h2;D2;X4]-[C;H3;h3;D1;X4:1]'
    # ]
  • From the command line

    Getting help:

    molsig signature --help

    Run:

    molsig signature
        --smiles <SMILES>
        --output <Output file, tsv>

Build an alphabet from a set of SMILES

  • From Python

    Alphabet makes use of signatures to create a collection of morgan bits-to-atom signature mappings.

    See the creating-alphabet-basics notebook.

  • From the command line

    Getting help:

    molsig alphabet --help

    Run:

    molsig alphabet
      --smiles <Input file, txt>
      --output <Output file, npz>

Enumerate molecules from a ECFP fingerprint

  • From Python:

    See the enumeration-basics notebook.

  • From the command line:

    Getting help:

    molsig enumerate --help

    Run:

    molsig enumerate
      --smiles <SMILES>
      --alphabet <Input alphabet file, npz>
      --output <Output file, tsv>

Citation

If you use this software, please cite it as below.

Meyer, P., Duigou, T., Gricourt, G., & Faulon, J.-L. Reverse Engineering Molecules from Fingerprints through Deterministic Enumeration and Generative Models. In preparation.