ML on small datasets #3027

pfeatherstone · 2024-10-06T09:47:11Z

pfeatherstone
Oct 6, 2024

Does anybody have any tips on training ML algorithms on small datasets, like less than 1000 samples total.
I'm about to do a piece of work where I have to train a classifier on a custom sensor spewing time series data. There is no existing dataset, most of the work will be in data acquisition and labeling, whether that be manual labour, synthetic data generation, or some kind of apparatus that automatically acquires and labels.
It's tempting to just use a neural net but I want to avoid having to endlessly capture data. I've fallen into this trap before.
So do people have any tips. E.g

When using a NN classifier, any tips on

regularization
transfer learning techniques from time series models
fine tuning tricks
etc

Or, how about using some traditional machine learning algorithms, like the ones provided in dlib (hence why i'm asking here). If so, how do you go about it?

Use traditional feature extractor + SVM classifier ?
Denoising (e.g. non-negative matrix factorization)
etc

It would be cool if say I could get an ok performing model with a very small dataset, like ~100 samples, and iteratively improve from there. When using a neural net, straight off the bat you have to acquire a largish and highly varied dataset, which is a nuisance and can very easily lead to poor performing models if not enough care has gone into the data.

Thank you.

davisking · 2024-10-06T12:40:30Z

davisking
Oct 6, 2024
Maintainer

Yeah 1000 samples isn't enough to do any DNN stuff. It would just overfit like mad. Or work a bit, I guess it depends on how much you care about it not working well. Sometimes people don't care that much. It depends on your application. I would not use a DNN there though :)

To do this kind of thing you really need to manually construct a feature space that makes sense for the problem. That is obviously highly problem dependent. But it is in general the way to go for this kind of thing. People love ML because it advertises itself as a "you don't have to think about how to make things work, they just will!" but that's usually not the case. Most actual deployed ML I have seen is of the kind you are talking about. Some domain where training examples are rare/expensive and it really needs to work very well. And usually execute very fast too (i.e. not on an expensive GPU).

Anyway, if you can write up 5 (or so) 90% accurate solutions to the problem that gives you 5 really good features. Which you can then stick into an SVM and, so long as those 5 aren't highly duplicative of each other, get a learned function that is way more than 90% accurate. So maybe http://dlib.net/ml.html#auto_train_rbf_classifier would fit the bill.

Although RBF kernels are for when you at some level still don't know good ways to get a 90% solution with just manually written code. What's really nice is if there are subsets of the samples where you just know how to compute numbers that separate them. For example, if you were trying to make a classifier to tell you if someone weighs more than 160lbs, then one of your features could be their height. Since we know that, all other things being equal, bigger height means more weight. So you can constrain the classifier to output a bigger value as height gets bigger. That kind of thing is an extremely strong regularizer. It basically will not overfit.

The http://dlib.net/dlib/svm/svm_c_linear_trainer_abstract.h.html#svm_c_linear_trainer has set_learns_nonnegative_weights() just for this reason. That is, use a linear model and constrain it so that it can't learn to flip the sign around on your carefully thought out features. And then make features where you know that bigger values, all other things being equal, indicate being in class +1 and not class -1.

You can also still have something non-linear in your features. For instance, a piecewise linear function of a scalar is trivial to represent with a function linear in the parameters (the parameters are just the slopes of the different pieces of the function, which you can sign constrain so it will only learn a monotonic piecewise linear function). And if you want to learn a model that is the multiplication of two piecewise linear functions, that's easy too, you just take the dlib::tensor_product() of the feature vectors for the piecewise linear functions.

This kind of thing can give extremely accurate results. All provided there is some way to make meaningful features for the problem though.

2 replies

pfeatherstone Oct 6, 2024
Author

Thank you @davisking . I knew you would have some really valuable insight in this area.

davisking Oct 6, 2024
Maintainer

No problem 😁

pfeatherstone · 2024-10-30T13:13:55Z

pfeatherstone
Oct 30, 2024
Author

@davisking On the same subject, do you have any tips/thoughts on feature extraction models? So my interest is to explore traditional machine learning alternatives to metric learning.

I've looked at dnn_metric_learning_on_images_ex, dnn_metric_learning_ex and Pytorch alternatives which look great but I was wondering if there were ML algorithms which could do similar stuff using potentially much smaller datasets.

Linear Discriminant Analysis peaked my interest which I'll definitely give a go but I was wondering if things like empirical_kernel_map_ex could also provide some benefit.

Any advice is greatly appreciated.

2 replies

davisking Nov 2, 2024
Maintainer

It really depends. If you want to make a distance metric there is also http://dlib.net/ml.html#vector_normalizer_frobmetric and http://dlib.net/ml.html#linear_manifold_regularizer kinds of things. The empirical kernel map is also great, depending on the application.

You could use the empirical kernel map and then the frobmetric solver to get a nice distance metric on that space. It really depends on the application though what makes sense to do.

davisking Nov 2, 2024
Maintainer

So for instance, if you have only a few continuous features, then an empirical kernel map with the radial basis kernel is just amazing and frankly kinda the answer for fitting general functions on that kind of thing. In the most general case anyway, assuming you don't know some better application specific parametric thing to fit. But for a non-parametric "can fit any function" method on low dimensional data, you can't really beat RBF kernel methods. And the empirical kernel map makes kernelizing any linear method trivial.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ML on small datasets #3027

{{title}}

Replies: 2 comments 4 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

ML on small datasets #3027

pfeatherstone Oct 6, 2024

Replies: 2 comments · 4 replies

davisking Oct 6, 2024 Maintainer

pfeatherstone Oct 6, 2024 Author

davisking Oct 6, 2024 Maintainer

pfeatherstone Oct 30, 2024 Author

davisking Nov 2, 2024 Maintainer

davisking Nov 2, 2024 Maintainer

pfeatherstone
Oct 6, 2024

Replies: 2 comments 4 replies

davisking
Oct 6, 2024
Maintainer

pfeatherstone Oct 6, 2024
Author

davisking Oct 6, 2024
Maintainer

pfeatherstone
Oct 30, 2024
Author

davisking Nov 2, 2024
Maintainer

davisking Nov 2, 2024
Maintainer