Add Pauthenet 2017 style preprocessing for multiple features. #27

sdat2 · 2020-07-20T10:33:06Z

Currently each feature (say salt and temperature) is fitted with a separate set of PCs

e.g.

    max_depth = 2000
    z = np.arange(0., -max_depth, -10.)
    features_pcm = {'THETA': z, 'SALT': z}
    features = {'THETA': 'THETA', 'SALT': 'SALT'}
    m = pcm(K=5, features=features_pcm, maxvar=2)
    m.fit(ds, features=features, dim='Z')
    m.predict(ds, features=features, dim='Z', inplace=True)
    m.predict_proba(ds, features=features, dim='Z', inplace=True)

will have will have fitted two PCs to SALT and two PCs to THETA.

In Pauthenet et al. 2017 he first transforms SALT and THETA to a spline basis, to scale them, and then concatenates these elements into one long vector which he performs PCA on, resulting in three thermohaline PCs.

https://doi.org/10.1175/JPO-D-16-0083.1

We wouldn't need to worry about transforming on to a spline basis (it doesn't seem to make much difference), but we could add a keyword argument like

m = pcm(K=5, features=features_pcm, maxvar=2, join=True)

so that there is a concatenation before the principal component step.

The text was updated successfully, but these errors were encountered:

gmaze · 2020-08-17T09:24:22Z

hi @sdat2
It would be a plus to have several dimensionnality reduction logic.
From the Pauthenet 2017 approach I see 2 things for pyxpcm:

the spline basis projection is also a reduction step, this could be added to pyxpcm
the join step for PCA could in fact be generalized to any reduction method, basically this could be an option determining if the reduction is to be applied to each feature separately or to their concatenation.

The 2nd point is somehow tricky internally I think, and we have to keep in mind performance in mind, computing the PCA for the concatenated field will be much longer than on each features.

Would you like to work on this ?

sdat2 · 2020-08-17T09:53:26Z

hi @gmaze

I think Etienne thought that the spline basis projection step doesn't make much difference to the output of the PCA (assuming the inputs are clean), so we can probably not implement that step for now.

A join step does seem quite hard to implement given the class structure of pcm. I am mid-way through doing that so I would be happy to carry on working on this.

gmaze · 2020-08-17T13:21:05Z

I think Etienne thought that the spline basis projection step doesn't make much difference to the output of the PCA (assuming the inputs are clean), so we can probably not implement that step for now.

ok !

gmaze · 2020-08-17T13:25:21Z

A join step does seem quite hard to implement given the class structure of pcm. I am mid-way through doing that so I would be happy to carry on working on this.

It would require to take out of preprocessing_this the reduction step

gmaze added the question Further information is requested label Aug 17, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Pauthenet 2017 style preprocessing for multiple features. #27

Add Pauthenet 2017 style preprocessing for multiple features. #27

sdat2 commented Jul 20, 2020 •

edited

Loading

gmaze commented Aug 17, 2020

sdat2 commented Aug 17, 2020

gmaze commented Aug 17, 2020

gmaze commented Aug 17, 2020

Add Pauthenet 2017 style preprocessing for multiple features. #27

Add Pauthenet 2017 style preprocessing for multiple features. #27

Comments

sdat2 commented Jul 20, 2020 • edited Loading

gmaze commented Aug 17, 2020

sdat2 commented Aug 17, 2020

gmaze commented Aug 17, 2020

gmaze commented Aug 17, 2020

sdat2 commented Jul 20, 2020 •

edited

Loading