Parallel torsiondrive optimisations #277

jthorton · 2023-07-31T10:49:15Z

Description

This PR is another go at adding parallel optimisations in torsiondrives. These will only work on workers launched via the new worker CLI #272 option as they have to be in the primary process to spawn a pool of optimisations.

Local testing with XTB and Psi4 for some simple molecules leads to some good speed-ups with the optimum number of workers being around 4. Note that for the Psi4 test, the number of threads used by each worker was 6 with 4 works and was 8 for 1 and 2 worker tests due to resource limits.

When testing on the HPC I found that the method to create the pool had to be set to spawn rather than fork otherwise the optimisations would hang, which causes a slight slowdown in performance should we expose the method used to create the pool as a setting?

Testing script

'''Run torsiondrives using the bespokefit parallel opt interface to compare the timings of the total job with the number of workers'''

from qcelemental.models.common_models import Model
from qcelemental.models.procedures import TorsionDriveInput, TDKeywords, QCInputSpecification, OptimizationSpecification
import qcengine
import os


from qcelemental.models import Molecule
from qcelemental.models.procedures import TorsionDriveInput
# make sure we can use the harness
from openff.bespokefit.executor.services.qcgenerator.qcengine import TorsionDriveProcedureParallel

test_mols = ['biphenyl.json', 'biaryl1.json', 'biaryl2.json']

def main():
    NCORES = 2
    MEMORY = 100
    N_P_TASKS = 2
    folder_name = f'xtb_{N_P_TASKS}_worker_{NCORES}_cores_spawn'
    os.makedirs(folder_name, exist_ok=True)


    os.environ['BEFLOW_QC_COMPUTE_WORKER_N_TASKS'] = str(N_P_TASKS)

    # general models used for each torsiondrive
    xtb_model = Model(method='gfn2xtb', basis=None)
    #psi4_model = Model(method='B3LYP-D3BJ', basis='DZVP')
    qc_spec = QCInputSpecification(
        driver='gradient',
        keywords={'verbosity': 'muted'},
        model=xtb_model
    )
    opt_spec = OptimizationSpecification(
        procedure='geometric',
        keywords={
            'coordsys': 'dlc',
            'enforce': 0.1,
            'reset': True,
            'qccnv': True,
            'epsilon': 0.0,
            'program': 'xtb'
        }

    )


    for fname in test_mols:

        print(f'Creating task for {fname}')
        mol: Molecule = Molecule.parse_file(fname)
        # get the  index of the rotatble bond
        # make the input
        td_keywords = TDKeywords(dihedrals=mol.extras['dihedrals'], grid_spacing=[15])

        task = TorsionDriveInput(
            keywords=td_keywords,
            input_specification=qc_spec,
            initial_molecule=[mol],
            optimization_spec=opt_spec
        )
        print('Done, Running TorsionDrive')
        result = qcengine.compute_procedure(input_data=task, procedure='TorsionDriveParallel', task_config={'ncores': NCORES, 'memory': MEMORY, 'retries': 2})

        print('Done saving result')
        file_name = fname.split('.')[0]
        with open(os.path.join(folder_name, file_name + '.json'), 'w') as output:
            output.write(result.json())


if __name__ == '__main__':
    main()

Todos

Notable points that this PR has either accomplished or will accomplish.

TODO 1

Questions

should we expose the pool creation method as a setting?

Status

Ready to go

mattwthompson · 2023-07-31T12:36:22Z

should we expose the pool creation method as a setting?

I think so, if only because the default is different on macOS and Linux, so the sort of thing that would cause me headaches while debugging differences between development and production environments 🙃

jthorton · 2023-07-31T13:28:02Z

I think so, if only because the default is different on macOS and Linux,

The alternative would be to hardcode the spawn method so it is consistent across all platforms. If we do expose it we could potentially open up a can of worms where people get hanging tasks when using fork which might be hard to debug.

mattwthompson · 2023-07-31T13:38:21Z

So the options are

expose it, defaulting to "spawn"
hardcode it, also to "spawn"

I don't have a strong or informed opinion between the two; either should fix the potential issue of behavior differences on different hardware, and also the issue you found with "fork" above. I can certainly see that hardcoding it makes things simpler.

codecov · 2023-07-31T13:50:20Z

Codecov Report

Merging #277 (0121bf2) into main (4c4422f) will decrease coverage by 0.02%.
The diff coverage is 80.00%.

Additional details and impacted files

jthorton · 2023-07-31T15:27:20Z

That's right, its probably best to hardcode it for now while we check it works and then we can later expose the option if people request it.

mattwthompson

I'm not sure if you're really soliciting reviews while this project is under your direction - the real check is if this works in the production runs you do. That being said, LGTM, I'm not a huge fan of the kwargs pattern but it's hard to avoid here. Moving to task_config is nice as well.

jthorton added 6 commits July 20, 2023 15:44

make cli workers use the solo celery pool, enable parallel torsiondrives

dbcd9b4

remove logging calls in the torsiondrive

7e77128

try concurrent.futures pool to stop slurm hang

3c8a981

switch back to multiprocessing and use spawn in the pool

e5f911a

clean up comment

3a6dc66

typo in context

90c0793

fix worker test

0121bf2

mattwthompson approved these changes Jul 31, 2023

View reviewed changes

jthorton merged commit ef63c9f into main Aug 4, 2023

jthorton deleted the td_opts branch August 4, 2023 08:52

This was referenced Nov 17, 2023

Error with linear angle during QC generation within a torsion drive #294

Open

Efficient use of cluster resources #268

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallel torsiondrive optimisations #277

Parallel torsiondrive optimisations #277

jthorton commented Jul 31, 2023 •

edited

Loading

mattwthompson commented Jul 31, 2023

jthorton commented Jul 31, 2023

mattwthompson commented Jul 31, 2023

codecov bot commented Jul 31, 2023 •

edited

Loading

jthorton commented Jul 31, 2023

mattwthompson left a comment

Parallel torsiondrive optimisations #277

Parallel torsiondrive optimisations #277

Conversation

jthorton commented Jul 31, 2023 • edited Loading

Description

Todos

Questions

Status

mattwthompson commented Jul 31, 2023

jthorton commented Jul 31, 2023

mattwthompson commented Jul 31, 2023

codecov bot commented Jul 31, 2023 • edited Loading

Codecov Report

jthorton commented Jul 31, 2023

mattwthompson left a comment

Choose a reason for hiding this comment

jthorton commented Jul 31, 2023 •

edited

Loading

codecov bot commented Jul 31, 2023 •

edited

Loading