Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallel read and preprocess the data #371

Merged
merged 15 commits into from
Oct 6, 2024

Conversation

xiki-tempula
Copy link
Collaborator

@xiki-tempula xiki-tempula commented May 29, 2024

Use joblib to parallelise the read and preprocess.

@xiki-tempula xiki-tempula linked an issue May 29, 2024 that may be closed by this pull request
Copy link

codecov bot commented May 29, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 98.78%. Comparing base (0093905) to head (c6cb745).
Report is 1 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master     #371   +/-   ##
=======================================
  Coverage   98.78%   98.78%           
=======================================
  Files          28       28           
  Lines        1978     1982    +4     
  Branches      435      436    +1     
=======================================
+ Hits         1954     1958    +4     
  Misses          2        2           
  Partials       22       22           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@xiki-tempula xiki-tempula marked this pull request as ready for review May 29, 2024 20:34
@xiki-tempula xiki-tempula force-pushed the 359-speed-up-the-readpreprocess-in-abfe-workflow branch from 0f9ddae to d5fe5dd Compare June 3, 2024 09:03
Copy link
Member

@orbeckst orbeckst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks like a powerful new feature for the workflow. Do you have some simple performance benchmarks?

My primary concerns are (see comments)

  • Making sure that joblib is explicitly installed.
  • Is the new default n_jobs=-1 safe?
  • add docs

CHANGES Outdated Show resolved Hide resolved
environment.yml Show resolved Hide resolved
import pytest
from alchemtest.amber import load_bace_example
from alchemtest.gmx import load_ABFE
from joblib import parallel_config
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just import joblib so that it's clearer below what comes from joblib? In this way any un-qualified functions and classes are alchemlyb and everything else is external.

suffix="xvg",
T=310,
)
with parallel_config(backend="threading"):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find

Suggested change
with parallel_config(backend="threading"):
with joblib.parallel_config(backend="threading"):

clearer when quickly reading the code.

src/alchemlyb/workflows/abfe.py Show resolved Hide resolved
@@ -115,7 +116,7 @@ def __init__(
else:
raise NotImplementedError(f"{software} parser not found.")

def read(self, read_u_nk=True, read_dHdl=True):
def read(self, read_u_nk=True, read_dHdl=True, n_jobs=-1):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is default -1 really always the best choice? Did you try on a machine with, say, 16 cores, and 16 hyperthreaded cores (or really anything with hyperthreads)?

I am willing to make -1 the default if this is not throwing surprises for users. Otherwise the conservative 1 would be better and users can then explicitly enable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add versionchanged to docs

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a paragraph about parallelization: how to enable it, what it does (for each file), any potential problems...

@@ -201,6 +219,7 @@ def run(
overlap="O_MBAR.pdf",
breakdown=True,
forwrev=None,
n_jobs=-1,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see above

  • Is -1 safe as new default?
  • add docs (explanation)
  • add versionchanged

@@ -307,7 +329,7 @@ def update_units(self, units=None):
logger.info(f"Set unit to {units}.")
self.units = units or None

def preprocess(self, skiptime=0, uncorr="dE", threshold=50):
def preprocess(self, skiptime=0, uncorr="dE", threshold=50, n_jobs=-1):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see above

  • Is -1 safe as new default?
  • add docs (explanation)
  • add versionchanged

@orbeckst orbeckst self-assigned this Oct 2, 2024
Copy link
Member

@orbeckst orbeckst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing all my comments and especially the extensive docs.

@orbeckst orbeckst merged commit 30ecce0 into master Oct 6, 2024
7 of 8 checks passed
@orbeckst orbeckst deleted the 359-speed-up-the-readpreprocess-in-abfe-workflow branch October 6, 2024 16:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Speed up the read/preprocess in ABFE workflow
2 participants