-
Notifications
You must be signed in to change notification settings - Fork 18.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve data prefeteching efficiency #1535
Comments
cc: @mtamburrano @sguada |
@futurely it will be good to separate prefetching and data transformation, using a thread-pool and a shared buffer, so if you do a PR along this it will be good. |
Concurrent buffer introduces another dependency Intel TBB. Is it acceptable? |
@futurely take a look at https://gist.github.com/sguada/1e1d474a25f4ddcc7ba8 for an draft for a concurrent buffer |
Some time ago, I tried to invent a similar one in a project but finally found that it is hard to get concurrent data structure correct and efficient at the same time. Thorough unit tests of the blocking queue need to be added to get confidence before usage in production. TBB concurrent containers are generally block-free. Multiple threads can access them simultaneously. In this use case, blocking may not be a performance bottleneck. |
Current implementation prefetches only a single batch after the last batch was used for computations. The initial motivation for this design is probably to avoid thread synchronization. There may not be maximal overlap between computation and data IO. This problem becomes a serious bottleneck when multiple devices simultaneously train the data (#1148).
The IO efficiency can be increased by continuously prefetching multiple batches without waiting for the computation thread and storing them in a thread safe buffer with limited capacity.
The text was updated successfully, but these errors were encountered: