Upstream MVaR improvements #2150

saitcakmak · 2023-12-15T21:06:18Z

Summary:
Non-differentiability of MVaR came up in facebook/Ax#2077, so I'm reviving this diff to avoid similar errors in the future.

This upstreams a few improvements to MVaR such as m > 2 support for the get_mvar_set_via_counting (formerly get_mvar_cpu) and approximate differentiability.

The choice between the two implementations is exposed as a kwarg with docstrings explaining when each option is preferred. These were the findings from some old benchmarking I did (N1799210):

In the robust MOBO paper, we simply used the get_mvar_set_vectorized for acquisition functions (since n_w = 32 was small) and used the get_mvar_set_via_counting on CPU device for computing the MVaR for reporting with n_w = 512.
m = 2: Counting based code is slower except for when the MVaR set is very large and running on CPU. GPU is the faster device except for small n_w.
m = 3: There's a bit of inconsistency here, but overall: counting based code is slow for small n_w, particularly running on GPU. Device GPU is slower than CPU for small MVaR sets. For larger MVaR sets, vectorized implementation is faster on GPU until it OOMs. For CPU, counting based code is faster for large MVaR.
m = 4: vectorized code is faster for small MVaR sets. Counting based code is the best bet for larger sets.

Differential Revision: D35683069

facebook-github-bot · 2023-12-15T21:06:29Z

This pull request was exported from Phabricator. Differential Revision: D35683069

Summary: Non-differentiability of MVaR came up in facebook/Ax#2077, so I'm reviving this diff to avoid similar errors in the future. This upstreams a few improvements to MVaR such as `m > 2` support for the `get_mvar_set_via_counting` (formerly `get_mvar_cpu`) and approximate differentiability. The choice between the two implementations is exposed as a kwarg with docstrings explaining when each option is preferred. These were the findings from some old benchmarking I did (N1799210): - In the robust MOBO paper, we simply used the `get_mvar_set_vectorized` for acquisition functions (since `n_w = 32` was small) and used the `get_mvar_set_via_counting` on CPU device for computing the MVaR for reporting with `n_w = 512`. - `m = 2`: Counting based code is slower except for when the MVaR set is very large and running on CPU. GPU is the faster device except for small `n_w`. - `m = 3`: There's a bit of inconsistency here, but overall: counting based code is slow for small `n_w`, particularly running on GPU. Device GPU is slower than CPU for small MVaR sets. For larger MVaR sets, vectorized implementation is faster on GPU until it OOMs. For CPU, counting based code is faster for large MVaR. - `m = 4`: vectorized code is faster for small MVaR sets. Counting based code is the best bet for larger sets. Reviewed By: sdaulton Differential Revision: D35683069

facebook-github-bot · 2023-12-28T00:03:06Z

This pull request was exported from Phabricator. Differential Revision: D35683069

facebook-github-bot · 2023-12-28T01:49:15Z

This pull request has been merged in cba5637.

facebook-github-bot added the CLA Signed Do not delete this pull request or issue due to inactivity. label Dec 15, 2023

facebook-github-bot added the fb-exported label Dec 15, 2023

saitcakmak force-pushed the export-D35683069 branch from d18fbd8 to 786aecd Compare December 28, 2023 00:02

facebook-github-bot closed this in cba5637 Dec 28, 2023

facebook-github-bot added the Merged label Dec 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Upstream MVaR improvements #2150

Upstream MVaR improvements #2150

saitcakmak commented Dec 15, 2023

facebook-github-bot commented Dec 15, 2023

facebook-github-bot commented Dec 28, 2023

facebook-github-bot commented Dec 28, 2023

Upstream MVaR improvements #2150

Upstream MVaR improvements #2150

Conversation

saitcakmak commented Dec 15, 2023

facebook-github-bot commented Dec 15, 2023

facebook-github-bot commented Dec 28, 2023

facebook-github-bot commented Dec 28, 2023