Title will be here

Filipp Nikitin^1,2 · Dylan M. Anstine^2,3 · Olexandr Isayev^1,2,4*
¹Ray and Stephanie Lane Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA, USA
²Department of Chemistry, Carnegie Mellon University, Pittsburgh, PA, USA
³Department of Chemical Engineering and Materials Science, Michigan State University, East Lansing, MI, USA
⁴Department of Materials Science and Engineering, Carnegie Mellon University, Pittsburgh, PA, USA

📄 Paper · 📖 Citation · ⚙️ Setup · 🔗 GitHub

^*Corresponding author: olexandr@olexandrisayev.com

Overview

Abstract

Key Features

Setup

This setup has been tested on Ubuntu 22.04, but can be used across multiple platforms as PyTorch, Pytorch Geometric, and RdKit are widely supported. Installation will usually take up to 20 minutes.

Prerequisites

Python 3.10+
CUDA-compatible GPU (recommended for training)
Conda or Mamba (recommended)

Environment Setup

# Clone the repository
git clone https://github.com/isayevlab/LoQI.git
cd LoQI

# Create and activate conda environment
conda create -n loqi python=3.10 -y
conda activate loqi

# Install dependencies
pip install -r requirements.txt
pip install -e .

Data Setup

The training and evaluation require the ChEMBL3D dataset.

Available with this release:

Usage

Make sure the package is installed locally: pip install -e .

Model Training

# Train transition state model from scratch
python scripts/train.py \
    --config-path=./conf/ \
    --config-name ts_extended_data \
    train.gpus=1 \
    train.seed=28 \
    run_name=test_train \
    outdir="../test_runs" \
    data.dataset_root="/path/to/ts_dataset"

# Resume training from checkpoint
python scripts/train.py \
    --config-path=./conf/ \
    --config-name ts_extended_data \
    train.gpus=1 \
    train.seed=28 \
    run_name=test_train \
    outdir="../test_runs" \
    resume="./models/last_converted.ckpt"

# Customize training parameters
python scripts/train.py \
    --config-path=./conf/ \
    --config-name ts_extended_data \
    outdir=./outputs \
    train.gpus=2 \
    train.n_epochs=800 \
    train.seed=42 \
    data.batch_size=150 \
    optimizer.lr=0.0001

Model Inference and Sampling

Transition States Generation

# Generate transition states from atom-mapped SMILES
python scripts/sample_transition_state.py \
    --reactant_smi "[C:1]([c:2]1[n:3][o:4][n:5][n:6]1)([H:7])([H:8])[H:9]" \
    --product_smi "[C:1]1([H:7])([H:8])/[C:2](=[N:3]\\[H:9])[N:6]1[N:5]=[O:4]" \
    --config scripts/conf/ts_extended_data.yaml \
    --ckpt models/last_converted.ckpt \
    --output output.xyz \
    --n_samples 1 \
    --batch_size 32

# Generate transition states from XYZ files
python scripts/sample_transition_state.py \
    --reactant_xyz reactant.xyz \
    --product_xyz product.xyz \
    --config scripts/conf/ts_extended_data.yaml \
    --ckpt models/last_converted.ckpt \
    --output output.xyz \
    --n_samples 5 \
    --batch_size 32

# Generate multiple samples per reaction
python scripts/sample_transition_state.py \
    --reactant_smi "[C:1][C:2]([H:3])([H:4])[H:5]" \
    --product_smi "[C:1]=[C:2]([H:3])[H:4]" \
    --config scripts/conf/ts_extended_data.yaml \
    --ckpt models/last_converted.ckpt \
    --output ts_samples.xyz \
    --n_samples 10 \
    --batch_size 32

Input formats:

SMILES: Atom-mapped SMILES with explicit hydrogens (e.g., [C:1][H:2])
XYZ: Standard XYZ coordinate files (bonds will be inferred using OpenBabel)

Notes:

SMILES must have explicit hydrogens and can use atom mapping to specify atom correspondence
Reactant and product must have the same number of atoms
Output is saved as XYZ file(s) with transition state coordinates

Available Configurations

Transition State Configs:

ts_extended_data.yaml - Transition state model configuration
ts1x.yaml - Alternative transition state configuration

Model Configs:

loqi.yaml - LoQI stereochemistry-aware conformer generation model
nextmol.yaml - Alternative configuration for NextMol-style generation

Training Configuration

You can easily override configuration parameters:

# Example with custom parameters
python scripts/train.py \
    --config-path=./conf/ \
    --config-name ts_extended_data \
    outdir=./my_training \
    run_name=my_experiment \
    train.gpus=4 \
    train.n_epochs=500 \
    data.batch_size=64 \
    data.dataset_root="/path/to/ts_dataset" \
    wandb_params.mode=online

Citation

# citation is coming sson
@article{
}

You may also find useful our paper and model for low-energy conformer generation:

@article{nikitin2025scalable,
  title={Scalable Low-Energy Molecular Conformer Generation with Quantum Mechanical Accuracy},
  author={Nikitin, Filipp and Anstine, Dylan M and Zubatyuk, Roman and Paliwal, Saee Gopal and Isayev, Olexandr},
  year={2025}
}

This work builds upon the Megalodon architecture. If you use the underlying architecture, please also cite:

@article{reidenbach2025applications,
  title={Applications of Modular Co-Design for De Novo 3D Molecule Generation},
  author={Reidenbach, Danny and Nikitin, Filipp and Isayev, Olexandr and Paliwal, Saee},
  journal={arXiv preprint arXiv:2505.18392},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
app		app
assets		assets
data_processing		data_processing
megalodon_licence		megalodon_licence
scripts		scripts
src/megalodon		src/megalodon
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Title will be here

Overview

Abstract

Key Features

Setup

Prerequisites

Environment Setup

Data Setup

Available with this release:

Usage

Model Training

Model Inference and Sampling

Transition States Generation

Available Configurations

Training Configuration

Citation

About

Uh oh!

Releases

Packages

Languages

License

isayevlab/TSMegaGen

Folders and files

Latest commit

History

Repository files navigation

Title will be here

Overview

Abstract

Key Features

Setup

Prerequisites

Environment Setup

Data Setup

Available with this release:

Usage

Model Training

Model Inference and Sampling

Transition States Generation

Available Configurations

Training Configuration

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages