Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallel torsiondrive optimisations #277

Merged
merged 7 commits into from
Aug 4, 2023
Merged

Parallel torsiondrive optimisations #277

merged 7 commits into from
Aug 4, 2023

Conversation

jthorton
Copy link
Contributor

@jthorton jthorton commented Jul 31, 2023

Description

This PR is another go at adding parallel optimisations in torsiondrives. These will only work on workers launched via the new worker CLI #272 option as they have to be in the primary process to spawn a pool of optimisations.

Local testing with XTB and Psi4 for some simple molecules leads to some good speed-ups with the optimum number of workers being around 4. Note that for the Psi4 test, the number of threads used by each worker was 6 with 4 works and was 8 for 1 and 2 worker tests due to resource limits.
image
image

When testing on the HPC I found that the method to create the pool had to be set to spawn rather than fork otherwise the optimisations would hang, which causes a slight slowdown in performance should we expose the method used to create the pool as a setting?

Testing script

'''Run torsiondrives using the bespokefit parallel opt interface to compare the timings of the total job with the number of workers'''

from qcelemental.models.common_models import Model
from qcelemental.models.procedures import TorsionDriveInput, TDKeywords, QCInputSpecification, OptimizationSpecification
import qcengine
import os


from qcelemental.models import Molecule
from qcelemental.models.procedures import TorsionDriveInput
# make sure we can use the harness
from openff.bespokefit.executor.services.qcgenerator.qcengine import TorsionDriveProcedureParallel

test_mols = ['biphenyl.json', 'biaryl1.json', 'biaryl2.json']

def main():
    NCORES = 2
    MEMORY = 100
    N_P_TASKS = 2
    folder_name = f'xtb_{N_P_TASKS}_worker_{NCORES}_cores_spawn'
    os.makedirs(folder_name, exist_ok=True)


    os.environ['BEFLOW_QC_COMPUTE_WORKER_N_TASKS'] = str(N_P_TASKS)

    # general models used for each torsiondrive
    xtb_model = Model(method='gfn2xtb', basis=None)
    #psi4_model = Model(method='B3LYP-D3BJ', basis='DZVP')
    qc_spec = QCInputSpecification(
        driver='gradient',
        keywords={'verbosity': 'muted'},
        model=xtb_model
    )
    opt_spec = OptimizationSpecification(
        procedure='geometric',
        keywords={
            'coordsys': 'dlc',
            'enforce': 0.1,
            'reset': True,
            'qccnv': True,
            'epsilon': 0.0,
            'program': 'xtb'
        }

    )


    for fname in test_mols:

        print(f'Creating task for {fname}')
        mol: Molecule = Molecule.parse_file(fname)
        # get the  index of the rotatble bond
        # make the input
        td_keywords = TDKeywords(dihedrals=mol.extras['dihedrals'], grid_spacing=[15])

        task = TorsionDriveInput(
            keywords=td_keywords,
            input_specification=qc_spec,
            initial_molecule=[mol],
            optimization_spec=opt_spec
        )
        print('Done, Running TorsionDrive')
        result = qcengine.compute_procedure(input_data=task, procedure='TorsionDriveParallel', task_config={'ncores': NCORES, 'memory': MEMORY, 'retries': 2})

        print('Done saving result')
        file_name = fname.split('.')[0]
        with open(os.path.join(folder_name, file_name + '.json'), 'w') as output:
            output.write(result.json())


if __name__ == '__main__':
    main()

Todos

Notable points that this PR has either accomplished or will accomplish.

  • TODO 1

Questions

  • should we expose the pool creation method as a setting?

Status

  • Ready to go

@mattwthompson
Copy link
Member

should we expose the pool creation method as a setting?

I think so, if only because the default is different on macOS and Linux, so the sort of thing that would cause me headaches while debugging differences between development and production environments 🙃

@jthorton
Copy link
Contributor Author

I think so, if only because the default is different on macOS and Linux,

The alternative would be to hardcode the spawn method so it is consistent across all platforms. If we do expose it we could potentially open up a can of worms where people get hanging tasks when using fork which might be hard to debug.

@mattwthompson
Copy link
Member

So the options are

  • expose it, defaulting to "spawn"
  • hardcode it, also to "spawn"

I don't have a strong or informed opinion between the two; either should fix the potential issue of behavior differences on different hardware, and also the issue you found with "fork" above. I can certainly see that hardcoding it makes things simpler.

@codecov
Copy link

codecov bot commented Jul 31, 2023

Codecov Report

Merging #277 (0121bf2) into main (4c4422f) will decrease coverage by 0.02%.
The diff coverage is 80.00%.

Additional details and impacted files

@jthorton
Copy link
Contributor Author

That's right, its probably best to hardcode it for now while we check it works and then we can later expose the option if people request it.

Copy link
Member

@mattwthompson mattwthompson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if you're really soliciting reviews while this project is under your direction - the real check is if this works in the production runs you do. That being said, LGTM, I'm not a huge fan of the kwargs pattern but it's hard to avoid here. Moving to task_config is nice as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants