ML on small datasets #3027
Replies: 2 comments 4 replies
-
Yeah 1000 samples isn't enough to do any DNN stuff. It would just overfit like mad. Or work a bit, I guess it depends on how much you care about it not working well. Sometimes people don't care that much. It depends on your application. I would not use a DNN there though :) To do this kind of thing you really need to manually construct a feature space that makes sense for the problem. That is obviously highly problem dependent. But it is in general the way to go for this kind of thing. People love ML because it advertises itself as a "you don't have to think about how to make things work, they just will!" but that's usually not the case. Most actual deployed ML I have seen is of the kind you are talking about. Some domain where training examples are rare/expensive and it really needs to work very well. And usually execute very fast too (i.e. not on an expensive GPU). Anyway, if you can write up 5 (or so) 90% accurate solutions to the problem that gives you 5 really good features. Which you can then stick into an SVM and, so long as those 5 aren't highly duplicative of each other, get a learned function that is way more than 90% accurate. So maybe http://dlib.net/ml.html#auto_train_rbf_classifier would fit the bill. Although RBF kernels are for when you at some level still don't know good ways to get a 90% solution with just manually written code. What's really nice is if there are subsets of the samples where you just know how to compute numbers that separate them. For example, if you were trying to make a classifier to tell you if someone weighs more than 160lbs, then one of your features could be their height. Since we know that, all other things being equal, bigger height means more weight. So you can constrain the classifier to output a bigger value as height gets bigger. That kind of thing is an extremely strong regularizer. It basically will not overfit. The http://dlib.net/dlib/svm/svm_c_linear_trainer_abstract.h.html#svm_c_linear_trainer has set_learns_nonnegative_weights() just for this reason. That is, use a linear model and constrain it so that it can't learn to flip the sign around on your carefully thought out features. And then make features where you know that bigger values, all other things being equal, indicate being in class +1 and not class -1. You can also still have something non-linear in your features. For instance, a piecewise linear function of a scalar is trivial to represent with a function linear in the parameters (the parameters are just the slopes of the different pieces of the function, which you can sign constrain so it will only learn a monotonic piecewise linear function). And if you want to learn a model that is the multiplication of two piecewise linear functions, that's easy too, you just take the This kind of thing can give extremely accurate results. All provided there is some way to make meaningful features for the problem though. |
Beta Was this translation helpful? Give feedback.
-
@davisking On the same subject, do you have any tips/thoughts on feature extraction models? So my interest is to explore traditional machine learning alternatives to metric learning. I've looked at dnn_metric_learning_on_images_ex, dnn_metric_learning_ex and Pytorch alternatives which look great but I was wondering if there were ML algorithms which could do similar stuff using potentially much smaller datasets. Linear Discriminant Analysis peaked my interest which I'll definitely give a go but I was wondering if things like empirical_kernel_map_ex could also provide some benefit. Any advice is greatly appreciated. |
Beta Was this translation helpful? Give feedback.
-
Does anybody have any tips on training ML algorithms on small datasets, like less than 1000 samples total.
I'm about to do a piece of work where I have to train a classifier on a custom sensor spewing time series data. There is no existing dataset, most of the work will be in data acquisition and labeling, whether that be manual labour, synthetic data generation, or some kind of apparatus that automatically acquires and labels.
It's tempting to just use a neural net but I want to avoid having to endlessly capture data. I've fallen into this trap before.
So do people have any tips. E.g
When using a NN classifier, any tips on
Or, how about using some traditional machine learning algorithms, like the ones provided in dlib (hence why i'm asking here). If so, how do you go about it?
It would be cool if say I could get an ok performing model with a very small dataset, like ~100 samples, and iteratively improve from there. When using a neural net, straight off the bat you have to acquire a largish and highly varied dataset, which is a nuisance and can very easily lead to poor performing models if not enough care has gone into the data.
Thank you.
Beta Was this translation helpful? Give feedback.
All reactions