Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve data prefeteching efficiency #1535

Closed
futurely opened this issue Dec 5, 2014 · 6 comments
Closed

Improve data prefeteching efficiency #1535

futurely opened this issue Dec 5, 2014 · 6 comments

Comments

@futurely
Copy link

futurely commented Dec 5, 2014

Current implementation prefetches only a single batch after the last batch was used for computations. The initial motivation for this design is probably to avoid thread synchronization. There may not be maximal overlap between computation and data IO. This problem becomes a serious bottleneck when multiple devices simultaneously train the data (#1148).

The IO efficiency can be increased by continuously prefetching multiple batches without waiting for the computation thread and storing them in a thread safe buffer with limited capacity.

@bhack
Copy link
Contributor

bhack commented Dec 5, 2014

cc: @mtamburrano @sguada

@sguada
Copy link
Contributor

sguada commented Dec 9, 2014

@futurely it will be good to separate prefetching and data transformation, using a thread-pool and a shared buffer, so if you do a PR along this it will be good.

@futurely
Copy link
Author

Concurrent buffer introduces another dependency Intel TBB. Is it acceptable?

@bhack bhack mentioned this issue Dec 29, 2014
@sguada
Copy link
Contributor

sguada commented Dec 29, 2014

@futurely take a look at https://gist.github.com/sguada/1e1d474a25f4ddcc7ba8 for an draft for a concurrent buffer

@futurely
Copy link
Author

Some time ago, I tried to invent a similar one in a project but finally found that it is hard to get concurrent data structure correct and efficient at the same time. Thorough unit tests of the blocking queue need to be added to get confidence before usage in production. TBB concurrent containers are generally block-free. Multiple threads can access them simultaneously. In this use case, blocking may not be a performance bottleneck.

@shelhamer
Copy link
Member

Solved by #2366 #2367 #2368 #2383 #2386 soon to be merged as part of #2114. The persistent prefetch thread and data reader avoid overhead and can fetch multiple batches.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants