How to apply SparseRandomProjector to large Image dataset? #40

PeterKim1 · 2022-03-30T23:40:32Z

Hello.

I want to apply this model to large image dataset. (I have over 10,000 images)

But RAM memory issue arise.

self.features = model.transform(self.X)

I think this code puts all the data embedding into RAM memory and apply SparseRandomProjector, which seems to put a lot of

pressure on RAM memory.(I'm just novice, so this may be wrong.)

Does anyone know how to solve this problem?

One idea i have is to split the data in half and apply the SparseRandomProjector to each of them, but I think it might cause problems

because SparseRandomProjector determines the dimensionality of embeddings based on Johnson-Lindenstrauss lemma.

According to sklearn document(https://scikit-learn.org/stable/modules/generated/sklearn.random_projection.SparseRandomProjection.html), n_components can be automatically adjusted according to the number of samples in the dataset.

The text was updated successfully, but these errors were encountered:

Provide feedback