Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data queues, prefetching and multi-source #1773

Closed
wants to merge 1 commit into from
Closed

Conversation

cypof
Copy link
Member

@cypof cypof commented Jan 21, 2015

I split the work on data_layer from #1148. It was written initially to get enough bandwidth to feed multiple GPUs and fix performance issues with the thread creation/destruction on each batch. Over time a few other things got in. In particular we are experimenting at Flickr with different ratios of classes by reading from multiple sources. E.g. each dataset can be setup to contain one class, and the probability of each source defines the class ratios at runtime. Features:

  • Reading from multiple sources, in case one network location or disk cannot feed the solvers. Each source can hold only a shard, in which case they need probabilities balanced by their size. Or a copy of the same dataset with a random offset, which might also change SGD behavior a bit, as some examples might be seen multiple times before the second epoch, but over time coverage should be the same.
  • Probabilities on sources, e.g. to change the ratio of positive/negative when doing binary classification.
  • One loading thread per database, even if multiple solvers are running. For single threaded DBs like LevelDB, and to ensure sequential access, which is usually faster. In almost all cases one thread is enough for loading speed as it doesn't do anything else. There is still a transform thread for each solver like today.
  • No thread creation/deletion per batch. It's inefficient and it causes problems with components that rely on thread-local caching. We also had problem with memory pinning and virtual memory. C.f. @thatguymike
  • Prefetch asynchronously to each GPU on a separate CUDA stream, so that the batch is already on the GPU when the solver needs it.
  • Prefetch a configurable number of batches in host memory to erase bandwidth glitches, in particular if data is loaded from a network it might make sense to configure a large prefetch queue.

@cypof cypof mentioned this pull request Jan 21, 2015
@shelhamer
Copy link
Member

@cypof thanks for all the data pipeline improvements. Just a heads-up: this'll likely need a rebase after #1748.

@cypof
Copy link
Member Author

cypof commented Jan 22, 2015

Deleted my branch by mistake, copied the PR to #1775

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants