Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support/document Confidence Interval Ellipse #3715

Closed
z4668640 opened this issue Dec 19, 2024 · 9 comments · Fixed by #3747
Closed

Support/document Confidence Interval Ellipse #3715

z4668640 opened this issue Dec 19, 2024 · 9 comments · Fixed by #3747
Milestone

Comments

@z4668640
Copy link

What is your suggestion?

Hello author, can Altair add confidence interval ellipses for different groups of points in the plot after drawing PCA scatter plot?
The following is a PCA scatter plot drawn in R language ggplot2.
Image
Here is the PCA scatter plot I am drawing now with altair and the implementation code.
Image

chart = alt.Chart(data).mark_circle(size=60).encode(
    x=alt.X(pc1_column, title='Principal Component 1'),
    y=alt.Y(pc2_column, title='Principal Component 2'),
    color=alt.Color('Environment1', title='Environment'), 
#    tooltip=['Index', pc1_column, pc2_column]
)
chart.display()

Is there any solution at the moment? thx.

Have you considered any alternative solutions?

Image
That was one of my attempt, but it obviously didn't work.
I don't have any good ideas right now.

kmeans = KMeans(n_clusters=3, random_state=0).fit(data[[pc1_column, pc2_column]])
data['Cluster'] = kmeans.labels_

chart = alt.Chart(data).mark_circle(size=60).encode(
    x=alt.X(pc1_column, title='Principal Component 1'),
    y=alt.Y(pc2_column, title='Environment'),
    color=alt.Color('Environment1', title='Environment'),
    tooltip=[pc1_column, pc2_column, 'Environment1']
)

def plot_confidence_ellipses_altair(df, chart):
    for cluster in df['Cluster'].unique():
        cluster_data = df[df['Cluster'] == cluster]
        mean_x = cluster_data[pc1_column].mean()
        mean_y = cluster_data[pc2_column].mean()
        cov_matrix = np.cov(cluster_data[pc1_column], cluster_data[pc2_column])
        eigvals, eigvecs = np.linalg.eigh(cov_matrix)
        order = eigvals.argsort()[::-1]
        eigvals, eigvecs = eigvals[order], eigvecs[:, order]
        theta = np.degrees(np.arctan2(*eigvecs[:, 0][::-1]))
        width, height = 2 * np.sqrt(chi2.ppf(0.95, 2)) * np.sqrt(eigvals)

        ellipse_df = pd.DataFrame({
            'x': mean_x + width / 2 * np.cos(np.linspace(0, 2 * np.pi, 100)) * np.cos(np.radians(theta)) -
                 height / 2 * np.sin(np.linspace(0, 2 * np.pi, 100)) * np.sin(np.radians(theta)),
            'y': mean_y + width / 2 * np.sin(np.linspace(0, 2 * np.pi, 100)) * np.cos(np.radians(theta)) +
                 height / 2 * np.cos(np.linspace(0, 2 * np.pi, 100)) * np.sin(np.radians(theta))
        })

        chart += alt.Chart(ellipse_df).mark_line(color='red', opacity=0.5, strokeWidth=2).encode(
            x=alt.X('x:Q', title='Principal Component 1'),
            y=alt.Y('y:Q', title='Environment')
        )

    return chart

chart = plot_confidence_ellipses_altair(data, chart)
chart.display()
@dangotbanned
Copy link
Member

@z4668640 could you please summarize - ideally in a few lines - what the feature request is here?

@z4668640
Copy link
Author

z4668640 commented Jan 4, 2025

@z4668640 could you please summarize - ideally in a few lines - what the feature request is here?

What I mean is that I want to use Altair to get a confidence interval similar to the one in the first PCA scatter plot above.
The rest is that I tried to solve the matter myself, but failed.

@dangotbanned
Copy link
Member

dangotbanned commented Jan 4, 2025

@z4668640 could you please summarize - ideally in a few lines - what the feature request is here?

What I mean is that I want to use Altair to get a confidence interval similar to the one in the first PCA scatter plot above. The rest is that I tried to solve the matter myself, but failed.

Thanks @z4668640 that is helpful

Could you add a code block between this text and the ggplot2 output (in #3715 (comment)), so we can see what API(s) were used to get that result please?

The following is a PCA scatter plot drawn in R language ggplot2.

Using R is fine, but if you wanted to go the extra mile - maybe try rewriting it using https://github.com/has2k1/plotnine

@dangotbanned dangotbanned changed the title About the method of drawing confidence intervals Support/document Confidence Interval Ellipse Jan 4, 2025
@dangotbanned
Copy link
Member

I believe plotnine.stat_ellipse would be an example of implementing this with numpy, scipy
Source code

I also found an old closed PR (#514 by @essicolo) that would have added an example for this.
The blocker at the time is no longer an issue as (#3202 by @joelostblom) added scipy as a docs dependency.

I'm working on cleaning up that example right now.
Is something like this what you'd be hoping to see @z4668640?

Filled No Fill

@z4668640
Copy link
Author

z4668640 commented Jan 4, 2025

This looks like what I want. Thank you very much for your help. These diagrams look beautiful and I would appreciate a tutorial on how to use them. @dangotbanned

@z4668640
Copy link
Author

z4668640 commented Jan 4, 2025

cloud you let me know when this tutorial is ready? I would appreciate it.

@dangotbanned
Copy link
Member

cloud you let me know when this tutorial is ready? I would appreciate it.

Will do @z4668640

dangotbanned added a commit that referenced this issue Jan 4, 2025
Happy with the end result, but not comfortable merging so much complexity I don't understand yet

#3715
@dangotbanned dangotbanned added this to the 5.6.0 milestone Jan 12, 2025
mattijn added a commit that referenced this issue Jan 18, 2025
* Create deviation_ellipses.py

example showing bivariate deviation ellipses of petal length and width of three iris species

* docs: Initial rewrite of (#514)

Happy with the end result, but not comfortable merging so much complexity I don't understand yet

#3715

* ci(typing): Adds `scipy-stubs` to `altair[doc]`

`scipy` is only used for one example in the user guide, but this will be the second
https://docs.scipy.org/doc/scipy/release/1.15.0-notes.html#other-changes

* fix: Only install `scipy-stubs` on `>=3.10`

* chore(typing): Ignore incorrect `pandas` stubs

* ci(typing): ignore `scipy` on `3.9`

https://mypy.readthedocs.io/en/stable/running_mypy.html#missing-library-stubs-or-py-typed-marker

https://github.com/vega/altair/actions/runs/12612565960/job/35149436953?pr=3747

* docs: Add missing category

* fix: Add missing support for `from __future__ import annotations`

Fixes https://github.com/vega/altair/actions/runs/12612637008/job/35149593128?pr=3747#step:6:25

* test: skip example when `scipy` not installed

Temporary fix for https://github.com/vega/altair/actions/runs/12612997919/job/35150338097?pr=3747

* docs: reduce segments `100` -> `50`

Observed no visible reduction in quality.
Slightly visible at `<=40`

* docs: Clean up `numpy`, `scipy` docs/comments

* refactor: Simplify `numpy` transforms

* docs: add tooltip, increase size

* fix: Remove incorrect range stop

Previously returned `segments+1` rows, but this isn't specified in `ggplot2
https://github.com/tidyverse/ggplot2/blob/efc53cc000e7d86e3db22e1f43089d366fe24f2e/R/stat-ellipse.R#L122

* refactor: Remove special casing `__future__` import

I forgot that the only requirement was that the import is the **first statement**.

Partially reverts (7cd2a77)

* docs: Remove unused `method` code

Also resolves #3747 (comment)

* docs: rename to 'Confidence Interval Ellipses'

* docs: add references to description

* docs: Adds methods syntax version

Includes comment removal suggestion in (#3747 (comment))

* refactor: Rewrite `pd_ellipse`

- Fixed a type ignore (causes by incomplete stubs)
- Renamed variables
- Make replace the implicit `"index"` column with naming it `"order"`

#3747 (comment)

* ci(uv): sync `scipy-stubs`

dc7639d
a296b82

* refactor(typing): Try removing `from __future__ import annotations`

#3747 (comment), #3747 (comment)

* refactor: rename `np_ellipse` -> `confidence_region_2d`

#3747 (comment)

* refactor: rename `pd_ellipse` -> `grouped_confidence_regions`

#3747 (comment)

* docs: change category to `"case studies"`

#3747 (comment)

* styling

---------

Co-authored-by: Serge-Étienne Parent <essicolo@tuta.io>
Co-authored-by: Mattijn van Hoek <mattijn@gmail.com>
@dangotbanned
Copy link
Member

cloud you let me know when this tutorial is ready? I would appreciate it.

@z4668640 Thank you for raising this issue, we'll have this example in the docs in the next minor release (5.6.0) of altair.

If you are feeling eager, the example code has been merged and can be found at deviation_ellipses.py.

Note

The code may change before release, but there are currently no plans to do so

image

@z4668640
Copy link
Author

@dangotbanned Thank you very much for your help. I can't wait to try this. Hats off to you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants