Across all models, are certain cell painting features more explanatory than others? #64

gwaybio · 2019-09-13T14:33:04Z

Exploring model coefficients across all models, what does this distribution look like?

gwaybio · 2019-09-18T13:53:46Z

This paper might be an important resource Zahedi et al. 2018.

I haven't done the analysis listed above, but anecdotally, I have seen a bunch of mito features pop up with high weights.

gwaybio · 2019-09-20T19:48:55Z

Comparing coefficients across all 70 cell health models using real and shuffled models. The model coefficient sum is much higher in real data models. An, on average, it looks like the Mito channel is the highest compared to all other labeled channels.

Remaining Todo

I imagine that this result will be different depending on the actual cell health variable. Stratify the type of variable and re plot

gwaybio · 2019-09-20T19:49:25Z

cc @AnneCarpenter @shntnu

gwaybio · 2019-09-21T14:08:43Z

In 9a33ac0, I compare feature performances across cell lines.

Interpretation

Not surprisingly, training with real data shows lower cell line specific MSE across features. These values are relatively consistent across cell lines as well, although it does appear that HCC44 has the lowest overall MSE.

The F statistic is tracking the ratio of between group variance over within group variance. So high values will map to features that have high performance differences across cell line. Low values indicate features that are predicted consistently across cell lines. There are some features that are predicted well across cell lines, and some that are predicted with higher variance. If the feature is predicted poorly in HCC44, it tends to have a high F stat. Not surprisingly, the F statistics are higher in shuffled data.

gwaybio · 2019-11-20T21:47:00Z

In #81, I add two additional visualizations:

Note that the axes represent the total sum of each individual coefficient across all models.

Top 50 Features

All Features

Summary

Feature weights are much higher in real vs. shuffled data
Top features don't really participate in many models
- This is sort of expected since feature selection is not applied a priori (so there are lots of redundant features here) and the regression models are elastic net
Not sure how to interpret coefficients with high weights!

gwaybio added the Experiments Tracking experimental questions, results, or analysis label Sep 13, 2019

gwaybio mentioned this issue Sep 13, 2019

Rerun Full 3.train pipeline and add README #65

Merged

gwaybio mentioned this issue Sep 20, 2019

Track Coefficients and Visualize Model Performance Across Cell Lines #66

Merged

gwaybio mentioned this issue Sep 22, 2019

Model Performance Varies across Cell Lines #61

Closed

gwaybio mentioned this issue Nov 20, 2019

Analyzing Model Coefficients #81

Merged

gwaybio closed this as completed Nov 30, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Across all models, are certain cell painting features more explanatory than others? #64

Across all models, are certain cell painting features more explanatory than others? #64

gwaybio commented Sep 13, 2019

gwaybio commented Sep 18, 2019

gwaybio commented Sep 20, 2019

gwaybio commented Sep 20, 2019

gwaybio commented Sep 21, 2019

gwaybio commented Nov 20, 2019 •

edited

Loading

Across all models, are certain cell painting features more explanatory than others? #64

Across all models, are certain cell painting features more explanatory than others? #64

Comments

gwaybio commented Sep 13, 2019

gwaybio commented Sep 18, 2019

gwaybio commented Sep 20, 2019

Remaining Todo

gwaybio commented Sep 20, 2019

gwaybio commented Sep 21, 2019

Interpretation

gwaybio commented Nov 20, 2019 • edited Loading

Top 50 Features

All Features

Summary

gwaybio commented Nov 20, 2019 •

edited

Loading