-
Notifications
You must be signed in to change notification settings - Fork 160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
how to speedup or use full memory? #264
Comments
Hi, did you consider reducing number of input samples? For example, you can randomly sample, for example, 5% of your data. In addition, your samples don't have to be fixed. For each epoch you can get a new sample. In some sense, it's just a sophisticated way of reducing number of epochs. Smaller sample size should approximate your dataset and obtained map should generalise to a larger set |
Thanks for that, may i know how to set new sample for each epoch? Are you mentioning on sofm.train(df, epochs=200) this, where df is split to small batches, If i split df, then i have to build multiple models right, i have a test data separate, its hard for prediction right on multiple models |
you need to do it yourself Solution 1 data_sample = randomly_sample(data)
for _ in range(200):
sofm.train(data_sample, epochs=1) Solution 2 for _ in range(200):
data_sample = randomly_sample(data)
sofm.train(data_sample, epochs=1) |
Oh Yes, Thanks for that, Will try |
I have the same issue when training with a large number of array of data.
The above is my sample code and its taking hell lot of time to finish the loop with the train which is killing the performance. Can someone of you help how to optimize and speed up the algorithm in the above scenario? |
Hi @sujithgangaraju, do you have to run it for 100 epochs? Can you use some sort of convergence criteria in order to avoid training it for that many epochs? For example, epsilon = 0.1 # you should pick the right value
for epoch in range(100):
sofm.train(data, epochs=1)
training_errors = sofm.errors.train
if epoch >= 1 and abs(training_errors[-1] - training_errors[-2]) < epsilon:
break # stop training Also, is it possible to minimise number of clusters that you need to test? In your example there is very little difference between 387 and 388, so difference in silhouette scores should be random. In addition, instead of doing a greed search you can do a bit more intelligent hyper parameter search, for example, you can use TPE (Python library that supports TPE: https://github.com/hyperopt/hyperopt). Check this article in order to learn more about TPE: http://neupy.com/2016/12/17/hyperparameter_optimization_for_neural_networks.html In addition, you can distribute training across multiple machines. Each machine can be trained using unique set of I hope it helps |
Yes, I have to run through 100 epochs, the cluster ranges which are calculating from min and max cluster. so all sequential ranges are looping over. in my scenario, for one element 387 - the sofm.train is taking almost an hour to finish for first element. can we speed up with same cluster ranges and with 100 epochs? I tried with the above code snippet over 100 epochs range but I dont see a difference in performance. |
When you have large number of clusters adding/removing one or two clusters shouldn't show you any effect. In think in that case increase in the cluster sizes should follow exponential function. Any difference between 387 and 388 won't be significant unless you have very large dataset (for example, > 1M rows) and even then you need to run your experiments a lot of time. And even if the difference is statistically significant I doubt that it will have any practical significance.
Also, as I said before, you can run training on different machines using different subset of cluster ranges. In this way you will parallelise your training. It might be important to shuffle your clusters before processing in different machines. For example, if first machine processes clusters with sizes [2, 3,4] and the other one with sizes [5, 6, 7] ,clearly, the second machine should spend more time on training only because each one of the possible cluster sizes is larger compare to the cluster sizes on the first machine.
What convergence curve do you get when you plot error differences after each epoch? Also, you can do batch training , but you need to make sure that your batches are reproducible during the training. Maybe you can do something like this for epoch in range(100):
data_sample = randomly_sample(data, random_seed=epoch)
sofm.train(data_sample, epochs=1) Other than that, I don't think there its anything you can do to speed up algorithm without rewriting the code or extending the main logic with some heuristics |
Thanks for the inputs, I will let you know if any help I need. |
Hi @itdxer
whether is it good decision to pickle the model after 100 epochs and load back again and continue training, will that cause overfitting? I understand keras model.save saves information for restarting training, whether sofm can be done like that, here on sampling Here i have 2 questions?
I am saving sofm using pickle.dump. Last one: is there chances for passing validation data and get validation error on each epoch? |
That shouldn't happen. Is that because of SOFM? Does it fail when you're using some other model? For example, k-means from scikit-learn.
Yes, but you need to make sure that datasets that you're applying during the training are reproducible. For example, if you use Hopefully my previous explanation give you some explanation for the second question as well. Validation could be done in the following way test_cluster_indices = sofm.predict(x_test).argmax(axis=1)
cluster_centers = self.weights[np.arange(len(test_clusters)), test_cluster_indices]
test_error = np.abs(x_test - cluster_centers).mean() Note that training and test errors won't be comparable in that case since training error is being accumulated during the training, so that's some sort of running error estimate over the epoch (weights are being updated N times, where N is the number of samples). |
Thanks @itdxer and Thanks for validation suggestions. regarding, reload sofm, i got it. |
It depends on the number of epochs, since a model stored in each file has been trained on a different number of epochs. The first model went from 1st to 200th epoch, which means that you have to start from epoch 201, but that won't be true for the second model, since it was trained until epoch 400, which means that after restart you have to start from epoch 401 |
Hi @itdxer After 4 batches the error keeps increasing, so not happy with the model Need something to increase speed either by using all cores or something, is that possible by having n_jobs parameter in train function to use all cores. I don't have GPU capacity. Any suggestiosn for speed up. |
Looks like each increase and decrease happens for a fixed number of learning cycles. Looks like this width is around 100 cycles (which probably the same as 100 epochs in your case). If you're using different batches per each 100 epochs than these errors are not comparable and increase or decrease in your graph doesn't mean that it's getting better or worse. You can see that error jumps only when you change your training batch. But when you run your training with a fixed batch your error is increasing. To make error curve a bit more reliable you need to fix test or validation set and evaluate your error after each iteration on a fixed set, otherwise it's hard to interpret the error curve.
There is no way of doing it now, maybe there will be some option in the future |
Actually batch run is from the solution recommended and data is unsupervised
|
Hi, I am running
algorithms.SOFM(n_inputs=79, n_outputs=2, learning_radius=1, step=0.1, shuffle_data=True, weight='sample_from_data',verbose=True)
data array size is 4.3gb
running windows 10 machine with 32GB RAM,
Set to 200 epoch, each epoch taking 25mins, this may take 3.3 days to complete.
its not using full memory though, while running RAM usage is just 14-15GB, still 16 gigs free sitting up there
May i know how to speed up or use full memory or cores please
Thanks
The text was updated successfully, but these errors were encountered: