You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Aug 11, 2020. It is now read-only.
The removal of OpenMP from this tensor_cpu_inl.h caused a massive performance regression for us on Windows (MSVC 2013), Mac (Clang), and Linux (gcc): f225763
Locally, we've reverted this commit and gotten a tremendously positive result (20%+ improvement in training time), so it would be very helpful if there were some sort of option or flag we could use to enable OpenMP parallelization for this function without internal forking.
The text was updated successfully, but these errors were encountered:
This code base has been donated to the Apache MXNet project per #373, and repo is deprecated. Future development and issue tracking should continue in Apache MXNet.
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
The removal of OpenMP from this tensor_cpu_inl.h caused a massive performance regression for us on Windows (MSVC 2013), Mac (Clang), and Linux (gcc): f225763
Locally, we've reverted this commit and gotten a tremendously positive result (20%+ improvement in training time), so it would be very helpful if there were some sort of option or flag we could use to enable OpenMP parallelization for this function without internal forking.
The text was updated successfully, but these errors were encountered: