Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use two streams, one per FT slice. #126

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Sopel97
Copy link
Member

@Sopel97 Sopel97 commented Jun 8, 2021

Since the computation of the two slices of the feature transformer output are independent we can try to run them on separate streams. On my GTX750 I can notice a slight performance increase with very small FT sizes, and cuda profiler shows some overlap between kernels.
obraz

This may increase performance on beefier GPUs like V100, but that remains to be tested.

Note that we can in fact run two separate streams for backward too, even though they operate on the same output buffer, because all writes are atomic.

@vondele
Copy link
Member

vondele commented Jun 8, 2021

that looks good to me, exposing the parallelism can only help.

@Sopel97
Copy link
Member Author

Sopel97 commented Jun 8, 2021

On V100 with 1 thread and 1 worker

1 thread, 1 worker:
before: 48.29 at 1000, 47.94 at 2000, 47.84 at 3000
after: 48.29 at 1000, 47.93 at 2000, 47.82 at 3000

8 threads, 4 workers:
before: 57.04 at 1000, 57.04 at 2000, 57.16 at 3000
after: 56.37 at 1000, 56.41 at 2000, 56.52 at 3000

so doesn't help, at least for now. But also doesn't do harm.

@vondele
Copy link
Member

vondele commented Jun 8, 2021

we kind of now that on V100 and above it is limited by the CPU.

@Sopel97
Copy link
Member Author

Sopel97 commented Jun 26, 2021

I'd like to see some benchmarks from other people before pushing this.

@Sopel97 Sopel97 added the help wanted Extra attention is needed label Jun 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants