Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add method for outputting PCA fields #26

Open
sdat2 opened this issue Jul 20, 2020 · 5 comments
Open

Add method for outputting PCA fields #26

sdat2 opened this issue Jul 20, 2020 · 5 comments
Labels
enhancement New feature or request

Comments

@sdat2
Copy link

sdat2 commented Jul 20, 2020

It would be interesting to be able to see the PCA fields after preprocessing, to see what space the clusters are actually fitting to.

If the PCA fields were attached to the dataset, it should be possible to add the conditional logic to prevent another preprocessing run on the same dataset for predict_proba().

@gmaze gmaze added the question Further information is requested label Aug 17, 2020
@gmaze
Copy link
Member

gmaze commented Aug 17, 2020

Hi @sdat2
It is possible to see the PCA fields with m.plot.reducer().
Once the PCM is fitted on a dataset, the PCA reducer is not longer fitted when a prediction is made.
Do you suggest that we should add the reduced data to the dataset ?

@sdat2
Copy link
Author

sdat2 commented Aug 17, 2020

Hi @gmaze

Ah cool, I had not spotted that feature. I guess adding it to the dataset isn't necessary as such, and would probably just complicate the repository. Here is my implementation of doing outputing PCA:

def add_pca_to_xarray(self, ds, features=None,
                          dim=None, action='fit',
                          mask=None, inplace=False):
        """
        A function to preprocess the fields, fit the pca,
        and output the pca coefficients to an xarray dataarray object.

        :param ds: :class:`xarray.Dataset` to process
        :param features: dictionary
        :param dim: string for dimension along which the model is fitted (e.g. Z)
        :param action: string to be forwarded to preprocessing function
        :param mask: mask over dataset
        :param inplace: whether to add the dataarray to the existing dataset,
               or just to return the datarray on its own.

        """
        with self._context('fit', self._context_args):
            X, sampling_dims = self.preprocessing(ds, features=features, dim=dim,
                                                  action=action, mask=mask)
            pca_values = X.values
            n_features = str(X.coords['n_features'].values)

        with self._context('add_pca.xarray', self._context_args):
            P = list()
            for k in range(np.shape(pca_values)[1]):
                X = pca_values[:, k]
                x = self.unravel(ds, sampling_dims, X)
                P.append(x)

            da = xr.concat(P, dim='pca').rename('PCA_VALUES')
            da.attrs['long_name'] = 'PCA Values'
            da.attrs['n_features'] = n_features

        # Add posteriors to the dataset:
        if inplace:
            return ds.pyxpcm.add(da)
        else:
            return da ```

@gmaze
Copy link
Member

gmaze commented Aug 17, 2020

nice ! just throw a PR and ask for a review, I'll check this out !

@gmaze gmaze added enhancement New feature or request and removed question Further information is requested labels Aug 17, 2020
@sdat2
Copy link
Author

sdat2 commented Aug 17, 2020

Thanks ! I've just realised that I've been working on master but I'll create a PR once I've sorted it out.

@sdat2
Copy link
Author

sdat2 commented Aug 17, 2020

#28 PR created.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants