-
Notifications
You must be signed in to change notification settings - Fork 19.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Could Keras handle a large dataset, for instance more than 50GB? #107
Comments
Keras can work with datasets that don't fit in memory, through the use of batch training. There are two ways to make this work: # let's say you have a BatchGenerator that yields a large batch of samples at a time
# (but still small enough for the GPU memory)
for e in range(nb_epoch):
print("epoch %d" % e)
for X_train, Y_train in BatchGenerator():
model.fit(X_batch, Y_batch, batch_size=32, nb_epoch=1)
# Alternatively, let's say you have a MiniBatchGenerator that yields 32-64 samples at a time:
for e in range(nb_epoch):
print("epoch %d" % e)
for X_train, Y_train in MiniBatchGenerator(): # these are chunks of ~10k pictures
model.train(X_batch, Y_batch) |
For people finding this for reference, the above calls to train and fit the model should be |
Note that nowadays you can use the |
Thanks @fchollet . Is it necessary to load all the training data in memory and then using custom generator, generate indefinite batches? I am training for 27k images(without loading in memory), which I did by above method but how can this be achieved using fit_generator() as it supports callbacks and also be used for validation dataset in one function as this can not be done using either train_on_batch() or fit()?
|
Does training for nb_epochs on a batch then moving onto the next batch significantly decrease the quality of the model, and can this method be done in parallel? It seems like redundantly reading from disk can be a bottleneck. |
Also updated example in `Reshape` layer.
Also updated example in `Reshape` layer.
It is really great to see such an elegant design. I am wondering is it possible Keras use a nosql database such lmdb as its data source, then load data and do computation in parallel?
The text was updated successfully, but these errors were encountered: