Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mini-batch support #1

Closed
nakosung opened this issue Jan 27, 2015 · 4 comments
Closed

Mini-batch support #1

nakosung opened this issue Jan 27, 2015 · 4 comments

Comments

@nakosung
Copy link

As you mentioned in caffe official repo, your implementation doesn't support mini-batch. What is your plan to extend your implementation? To support N-sized truncated BPTT with M minibatch, is introducing M sequential data layers good enough?

@junhyukoh
Copy link
Owner

I'm sorry that I don't have enough time to work on this project right now, but I will come back to this and finish TODO lists as soon as possible.
I'm not sure that I understand your idea correctly. Does it mean one forward/backward pass with N*M-sized data blob or M forward/backward passes with N-sized data blob?
In the former case, we have to consider the memory limit. For example, 100-sized 20 mini-batches is roughly equivalent to 2000 mini-batches in feedforward networks in terms of memory usage.
In the latter case, we have to compute the sum of gradients over several forward/backward passes. The main problem is that every layer does not preserve gradient(diff) after one backward pass, because one backward pass always involves one weight update in caffe.
I hope we will find a clever way to deal with this issue.
Thank you for sharing your idea!

@nakosung
Copy link
Author

Thank you for your detail explanation. My question was about the former approach as you said. Some researches seem to use 4-size(quite small) minibatch for RNN update. If '4' is not so bad for mini batch size, memory requirement doesn't seem to be as a big problem. For the latter one M-fwd and 1-bwd seems to be sufficient to BP. Isn't it? 1) accumulate gradient for each weight during grouped forward passes 2) and back propagate to achieve large sized mini-batch with 'physical' small-batch.

@junhyukoh
Copy link
Owner

The former approach seems to be a simple and good option.
Thank you for your comment!

@junhyukoh
Copy link
Owner

This implementation supports mini-batch update now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants