Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize tree-building for neptune/opencl (gpu2 flag). #1422

Merged
merged 2 commits into from
Mar 5, 2021

Conversation

porcuquine
Copy link
Collaborator

This PR supersedes #1420 — since we have decided to keep neptune/opencl behind the gpu2 feature flag for now.

@vmx Can you rebase your CI work on this branch and continue here? I made one modification to the line that actually runs the lifecycle tests. The previous version didn't set the flags, but the new one should correctly specify gpu2.

This commit makes the test actually run.
@porcuquine
Copy link
Collaborator Author

Okay, I moved @vmx's CI work over to this branch and adapted it to run lifecycle tests for both gpu and gpu2 features. These are now passing, and I've also run small passing lifecycle tests locally for both.

I've also verified that both the gpu and gpu2 code paths still perform as expected based on current master and the equivalent (but differently factored) code in #1420.

I think this is ready to go once the rest of CI passes.

let layer_data: &mut Vec<_> = &mut layer_data;

// gather all layer data in parallel.
s.spawn(move |_| {
Copy link
Contributor

@qy3u qy3u Mar 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you may want to spawn in the iteration.The current implemention seems doesn't paralleled actually.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right that this spawn seems unnecessary — not sure if it was always like that or got introduced in a refactor at some point. The change you suggest doesn't seem to help though (in my benchmark at least): the bottleneck is in the conversion that follows. Since the goal now is to release with minimal changes, and since all options (status quo, actually spawning in loop, just removing the whole scope/spawn) seem to perform equivalently, I think we should just leave as-is for now — to minimize change and risk, given that this has already been tested/benchmarked/etc quite a bit. It can be removed completely in some future cleanup that aims to improve further.

let layer_data: &mut Vec<_> = &mut layer_data;

// gather all layer data in parallel.
s.spawn(move |_| {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@porcuquine lets remove the unneeded spawn at least from the new code path

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have this in testing now with a new issue prepped for the various improvements noted here. If my testing goes ok, I'll merge this and open that issue.

@cryptonemo
Copy link
Collaborator

cryptonemo commented Mar 5, 2021

Would be interesting to see data on larger/actual tests, but here's a GPU comparison of a small test (32KiB seal lifecycle using first gpu and then gpu2).

gpu-vs-gpu2-32kib-lifecycle

@cryptonemo cryptonemo merged commit f937802 into master Mar 5, 2021
@cryptonemo cryptonemo deleted the feat/optimize-tree-building-prime branch March 5, 2021 13:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants