Skip to content

isayevlab/LoQI

 
 

Repository files navigation

LoQI: Scalable Low-Energy Molecular Conformer Generation with Quantum Mechanical Accuracy

Filipp Nikitin1,2·Dylan M. Anstine2,3·Roman Zubatyuk2,5·Saee Gopal Paliwal5·Olexandr Isayev1,2,4*
1Ray and Stephanie Lane Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA, USA
2Department of Chemistry, Carnegie Mellon University, Pittsburgh, PA, USA
3Department of Chemical Engineering and Materials Science, Michigan State University, East Lansing, MI, USA
4Department of Materials Science and Engineering, Carnegie Mellon University, Pittsburgh, PA, USA
5NVIDIA, Santa Clara, CA, USA

📄 Paper·📖 Citation·⚙️ Setup·🔗 GitHub

*Corresponding author: olexandr@olexandrisayev.com

Overview

Macrocycles

Abstract

Molecular geometry is crucial for biological activity and chemical reactivity; however, computational methods for generating 3D structures are limited by the vast scale of conformational space and the complexities of stereochemistry. Here we present an approach that combines an expansive dataset of molecular conformers with generative diffusion models to address this problem. We introduce ChEMBL3D, which contains over 250 million molecular geometries for 1.8 million drug-like compounds, optimized using AIMNet2 neural network potentials to a near-quantum mechanical accuracy with implicit solvent effects included. This dataset captures complex organic molecules in various protonation states and stereochemical configurations.

We then developed LoQI (Low-energy QM Informed conformer generative model), a stereochemistry-aware diffusion model that learns molecular geometry distributions directly from this data. Through graph augmentation, LoQI accurately generates molecular structures with targeted stereochemistry, representing a significant advance in modeling capabilities over previous generative methods. The model outperforms traditional approaches, achieving up to tenfold improvement in energy accuracy and effective recovery of optimal conformations. Benchmark tests on complex systems, including macrocycles and flexible molecules, as well as validation with crystal structures, show LoQI can perform low energy conformer search efficiently.

Note on Implementation: LoQI is built upon the Megalodon architecture developed, adapting it specifically for stereochemistry-aware conformer generation with the ChEMBL3D dataset.


Key Features

  • ChEMBL3D Dataset: 250+ million AIMNet2-optimized conformers for 1.8M drug-like molecules
  • Stereochemistry-Aware: First all-atom diffusion model with explicit stereochemical encoding
  • Quantum Mechanical Accuracy: Near-DFT accuracy with implicit solvent effects
  • Superior Performance: Up to 10x improvement in energy accuracy over traditional methods
  • Complex Molecule Support: Handles macrocycles, flexible molecules, and challenging stereochemistry

Setup

This setup has been tested on Ubuntu 22.04, but can be used across multiple platforms as PyTorch, Pytorch Geometric, and RdKit are widely supported. Installation will usually take up to 20 minutes.

Prerequisites

  • Python 3.10+
  • CUDA-compatible GPU (recommended for training)
  • Conda or Mamba (recommended)

Environment Setup

# Clone the repository
git clone https://github.com/isayevlab/LoQI.git
cd LoQI

# Create and activate conda environment
conda create -n loqi python=3.10 -y
conda activate loqi

# Install dependencies
pip install -e .
pip install -r requirements.txt

Data Setup

The training and evaluation require the ChEMBL3D dataset.

Available with this release Download Here:

  • Pre-trained LoQI model checkpoint (loqi.ckpt)
  • Processed ChEMBL3D lowest-energy conformers dataset (chembl3d_stereo)

Coming soon:

  • Full ChEMBL3D dataset (250M+ conformers) will be released in a separate repository
  • Complete dataset processing scripts and pipeline

Usage

Make sure that src content is available in your PYTHONPATH (e.g., export PYTHONPATH="./src:$PYTHONPATH") if LoQI is not installed locally (pip install -e .).

Model Training

# LoQI conformer generation model
python scripts/train.py --config-name=loqi outdir=./outputs train.gpus=1 data.dataset_root="./chembl3d_data"

# Customize training parameters
python scripts/train.py --config-name=loqi \
    outdir=./outputs \
    train.gpus=2 \
    train.n_epochs=800 \
    train.seed=42 \
    data.batch_size=150 \
    optimizer.lr=0.0001

Model Inference and Sampling

Conformer Generation

# Generate conformers for a single molecule
python scripts/sample_conformers.py \
    --config ./conf/loqi/loqi.yaml \
    --ckpt ./data/loqi.ckpt \
    --input "c1ccccc1" \
    --output ./outputs/benzene_conformers.sdf \
    --n_confs 10 \
    --batch_size 1


# Generate conformers with evaluation
python scripts/sample_conformers.py \
    --config ./conf/loqi/loqi.yaml \
    --ckpt ./data/loqi.ckpt \
    --input "CCO" \
    --output ./outputs/ethanol_conformers.sdf \
    --n_confs 100 \
    --batch_size 10

For reference, on an RTX 3090 GPU, inference for a typical ChEMBL molecule takes approximately 0.1 seconds per conformer when processed within a batch.

Note: Make sure you define correct paths for dataset and AimNet2 model in loqi.yaml. The relative path of AimNet2 model is src/megalodon/metrics/aimnet2/cpcm_model/wb97m_cpcms_v2_0.jpt.

Available Configurations

LoQI Models:

  • loqi.yaml - LoQI stereochemistry-aware conformer generation model
  • nextmol.yaml - Alternative configuration for NextMol-style generation

Training Configuration

You can easily override configuration parameters:

# Example with custom parameters
python scripts/train.py --config-name=loqi \
    outdir=./my_training \
    run_name=my_experiment \
    train.gpus=4 \
    train.n_epochs=500 \
    data.batch_size=64 \
    data.dataset_root="/path/to/chembl3d" \
    wandb_params.mode=online

Citation

If you use LoQI in your research, please cite our paper:

@article{nikitin2025scalable,
  title={Scalable Low-Energy Molecular Conformer Generation with Quantum Mechanical Accuracy},
  author={Nikitin, Filipp and Anstine, Dylan M and Zubatyuk, Roman and Paliwal, Saee Gopal and Isayev, Olexandr},
  year={2025}
}

This work builds upon the Megalodon architecture. If you use the underlying architecture, please also cite:

@article{reidenbach2025applications,
  title={Applications of Modular Co-Design for De Novo 3D Molecule Generation},
  author={Reidenbach, Danny and Nikitin, Filipp and Isayev, Olexandr and Paliwal, Saee},
  journal={arXiv preprint arXiv:2505.18392},
  year={2025}
}

About

LoQI: Low Energy QM Informed Conformer Generation

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%