Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parallelize USF and NUSF algorithms using Dask #1198

Merged
merged 20 commits into from
Mar 21, 2024

Conversation

kbuma
Copy link
Contributor

@kbuma kbuma commented Mar 4, 2024

Fixes/Addresses:

#1193

Summary/Motivation:

In order to improve the performance of the USF and NUSF algorithms for SDoE we parallelize them utilizing Dask.

We verify that the parallelized implementation provides the same result as the original algorithm with performance gain (USF 2.5x across platforms; NUSF 3.5x on Windows).

Changes proposed in this PR:

  • parallelized USF and NUSF algorithms utilizing Dask
  • FOQUS command line option to utilize Dask for SDoE algorithms
  • update to using Generator for random numbers generation for USF and NUSF (from legacy RandomState generators)
  • tests to verify that the parallelized implementation provides the same results as the original algorithm given the same random number sequence
  • benchmark to measure performance of the parallelized implementation vs the original algorithm

Legal Acknowledgement

By contributing to this software project, I agree to the following terms and conditions for my contribution:

  1. I agree my contributions are submitted under the copyright and license terms described in the LICENSE.md file at the top level of this directory.
  2. I represent I am authorized to make the contributions and grant the license. If my employer has rights to intellectual property that includes these contributions, I represent that I have received permission to make contributions and grant the required license on behalf of that employer.

@kbuma kbuma requested review from sotorrio1 and boverhof March 4, 2024 19:25
@kbuma kbuma self-assigned this Mar 4, 2024
@kbuma kbuma linked an issue Mar 4, 2024 that may be closed by this pull request
Copy link

codecov bot commented Mar 4, 2024

Codecov Report

Attention: Patch coverage is 86.30952% with 23 lines in your changes are missing coverage. Please review.

Project coverage is 38.74%. Comparing base (9aa7d35) to head (8feec24).

Files Patch % Lines
foqus_lib/framework/sdoe/nusf_dask.py 90.47% 4 Missing and 4 partials ⚠️
foqus_lib/framework/sdoe/sdoe.py 61.11% 4 Missing and 3 partials ⚠️
foqus_lib/foqus.py 16.66% 4 Missing and 1 partial ⚠️
foqus_lib/framework/sdoe/usf_dask.py 94.00% 1 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1198      +/-   ##
==========================================
+ Coverage   38.54%   38.74%   +0.20%     
==========================================
  Files         163      165       +2     
  Lines       36880    37038     +158     
  Branches     6106     6128      +22     
==========================================
+ Hits        14214    14351     +137     
- Misses      21553    21564      +11     
- Partials     1113     1123      +10     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

sotorrio1
sotorrio1 previously approved these changes Mar 5, 2024
Copy link
Member

@sotorrio1 sotorrio1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me! Tests pass, I've ran a few examples and I can see the speed up (especially for larger number of random starts), and the usf_benchmark tool also works great and helps to see the improvements

@ksbeattie ksbeattie added the Priority:Normal Normal Priority Issue or PR label Mar 5, 2024
@sotorrio1
Copy link
Member

@kbuma I can confirm that using the flag --sdoe_use_dask turns on the use of Dask for SDoE parallelization. I guess I was confused because of the small change in the printing statements to the console. Thank you!

@kbuma kbuma requested a review from henry-gatech March 7, 2024 17:19
@kbuma
Copy link
Contributor Author

kbuma commented Mar 12, 2024

pinned Dask to <2024.3 until dask/dask#10998 gets resolved

@kbuma kbuma changed the title parallelize USF algorithm using Dask parallelize USF and NUSF algorithms using Dask Mar 18, 2024
@kbuma kbuma linked an issue Mar 18, 2024 that may be closed by this pull request
@lbianchi-lbl lbianchi-lbl self-requested a review March 19, 2024 19:57
@ksbeattie
Copy link
Member

@sotorrio1 & @henry-gatech could we get a review here for this?

Copy link
Contributor

@lbianchi-lbl lbianchi-lbl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kbuma can you double-check that the way I resolved the Git conflicts makes sense?

I had a question about bokeh and a few minor comments, but none of them is blocking, so feel free to ignore them.

@@ -89,8 +89,10 @@
# Required packages needed in the users env go here (non-versioned strongly preferred).
# requirements.txt should stay empty (other than the "-e .")
install_requires=[
"bokeh!=3.0.*,>=2.4.2",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is bokeh needed for something in particular? I've tried both pip show bokeh and searching for imports in the FOQUS code, and I couldn't find anything.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bokeh is required to view the dask dashboard that provides live monitoring of Dask computations (https://docs.dask.org/en/latest/dashboard.html)

Copy link
Contributor

@lbianchi-lbl lbianchi-lbl Mar 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK thanks, the dashboard looks really cool! It looks like bokeh doesn't come with very onerous requirements of its own, so I think we can keep that in for the moment, and then revisit if/when it causes problems down the line.

foqus_lib/foqus.py Outdated Show resolved Hide resolved
foqus_lib/framework/sdoe/nusf.py Outdated Show resolved Hide resolved
foqus_lib/framework/sdoe/sdoe.py Show resolved Hide resolved
@lbianchi-lbl lbianchi-lbl self-requested a review March 21, 2024 22:09
@lbianchi-lbl lbianchi-lbl merged commit fbda808 into CCSI-Toolset:master Mar 21, 2024
31 checks passed
@kbuma kbuma deleted the sdoe-dask branch March 22, 2024 15:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Priority:Normal Normal Priority Issue or PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SDoE: parallelize NUSF SDoE: parallelize USF
4 participants