-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Support] Add parallel_for support to run a loop in parallel #6275
Conversation
cc @tkonolige |
I think I'm a little late to this discussion, but what is the reason for having our own thread pool/parallel_for implementation? OpenMP is already an optional dependency, we could use it when available. I can understand that having our own implementation means we don't have to depend on another library, but it also means we need to maintain it and add features as we need them (though this doesn't seem like much code). |
@jcf94 it would be great if we can simplify the parallel for implementation, e.g. std::thread has pretty low launching overhead, and we can likely drop the threadpool and start std::thread on each parallel_for round. As long as the function cost is not high, it should be a simpler implementation. |
I've tried a simpler implementation using std::thread, have a look if this is better. 😄 |
Thanks @jcf94 please look into the CI issue. Also please do a benchmark to see if the new implementation will/will not affect perf, since it is only fine for larger functions and we might still need pool for very fine grained parallelism |
Had some simple benchmark on the two implementations, the current one even works better in large loop size(since each threads' workload is pre-defined by the partitioner). |
@merrymercy @tqchen I'm thinking that do we need a interface to pass the |
for this, multi processing maybe enough... |
Thanks @jcf94 @merrymercy @jroesch @ @tkonolige ! |
As has mentioned in #5962.
We would like a runtime implementation of parallel loop to speed up some thread safe loops.
Take some API reference from https://docs.microsoft.com/en-us/cpp/parallel/concrt/parallel-algorithms?view=vs-2019#parallel_for. I've tried to add an implementation of parallel_for_each as well, but finding that the const reference style of tvm::Array seems not very compatible with this API.
cc @tqchen @merrymercy @FrozenGene @junrushao1994