Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Pauthenet 2017 style preprocessing for multiple features. #27

Open
sdat2 opened this issue Jul 20, 2020 · 4 comments
Open

Add Pauthenet 2017 style preprocessing for multiple features. #27

sdat2 opened this issue Jul 20, 2020 · 4 comments
Labels
question Further information is requested

Comments

@sdat2
Copy link

sdat2 commented Jul 20, 2020

Currently each feature (say salt and temperature) is fitted with a separate set of PCs

e.g.

    max_depth = 2000
    z = np.arange(0., -max_depth, -10.)
    features_pcm = {'THETA': z, 'SALT': z}
    features = {'THETA': 'THETA', 'SALT': 'SALT'}
    m = pcm(K=5, features=features_pcm, maxvar=2)
    m.fit(ds, features=features, dim='Z')
    m.predict(ds, features=features, dim='Z', inplace=True)
    m.predict_proba(ds, features=features, dim='Z', inplace=True)

will have will have fitted two PCs to SALT and two PCs to THETA.

In Pauthenet et al. 2017 he first transforms SALT and THETA to a spline basis, to scale them, and then concatenates these elements into one long vector which he performs PCA on, resulting in three thermohaline PCs.

https://doi.org/10.1175/JPO-D-16-0083.1

We wouldn't need to worry about transforming on to a spline basis (it doesn't seem to make much difference), but we could add a keyword argument like

m = pcm(K=5, features=features_pcm, maxvar=2, join=True)

so that there is a concatenation before the principal component step.

@gmaze gmaze added the question Further information is requested label Aug 17, 2020
@gmaze
Copy link
Member

gmaze commented Aug 17, 2020

hi @sdat2
It would be a plus to have several dimensionnality reduction logic.
From the Pauthenet 2017 approach I see 2 things for pyxpcm:

  • the spline basis projection is also a reduction step, this could be added to pyxpcm
  • the join step for PCA could in fact be generalized to any reduction method, basically this could be an option determining if the reduction is to be applied to each feature separately or to their concatenation.

The 2nd point is somehow tricky internally I think, and we have to keep in mind performance in mind, computing the PCA for the concatenated field will be much longer than on each features.

Would you like to work on this ?

@sdat2
Copy link
Author

sdat2 commented Aug 17, 2020

hi @gmaze

I think Etienne thought that the spline basis projection step doesn't make much difference to the output of the PCA (assuming the inputs are clean), so we can probably not implement that step for now.

A join step does seem quite hard to implement given the class structure of pcm. I am mid-way through doing that so I would be happy to carry on working on this.

@gmaze
Copy link
Member

gmaze commented Aug 17, 2020

I think Etienne thought that the spline basis projection step doesn't make much difference to the output of the PCA (assuming the inputs are clean), so we can probably not implement that step for now.

ok !

@gmaze
Copy link
Member

gmaze commented Aug 17, 2020

A join step does seem quite hard to implement given the class structure of pcm. I am mid-way through doing that so I would be happy to carry on working on this.

It would require to take out of preprocessing_this the reduction step

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants