Feature/calculateauc #35

antschum · 2021-12-14T16:39:05Z

Description of changes
Implementation to calculate the AUC with respect to CellType along with test files.

Still missing right now:

is a nice results output. (Current version returns huge dictionary with everything..)
feature importances (still not sure 1. how to best return them - eg. do I take the average if multiple importances are captured and 2. feature importances for logistic regression -> Issue.)
progress bar for calculations

github-actions · 2021-12-14T16:39:24Z

It looks like this pull-request is has been made against the theislab/augurpy master or main branch.
The master/main branch should always contain code from the latest release.
Because of this, PRs to master/main are only allowed if they come from any theislab/augurpy release or patch branch.

You do not need to close this PR, you can change the target branch to development by clicking the "Edit" button at the top of this page.

Thanks again for your contribution!

Zethson

Cool to see it coming together!

I'd suggest that you rename the Python files to ensure that they are descriptive.
calculate_AUC.py for example is not the nicest of all names.

Besides that just many tiny requests by me. Happy to see this merge soon! Then we'll also need to ensure that we properly document all of this. See #40

augurpy/calculate_AUC.py

augurpy/cv.py

Zethson · 2021-12-24T13:52:01Z

feature importances (still not sure 1. how to best return them - eg. do I take the average if multiple importances are captured and 2. feature importances for logistic regression -> Issue.)

Mhmm. Didn't we want to look at how Augur calculates them? Did you figure something out?

Co-authored-by: Lukas Heumos <lukas.heumos@posteo.net>

antschum · 2021-12-26T21:15:02Z

feature importances (still not sure 1. how to best return them - eg. do I take the average if multiple importances are captured and 2. feature importances for logistic regression -> Issue.)

Mhmm. Didn't we want to look at how Augur calculates them? Did you figure something out?

Augur just a huge matrix containing the raw data like so:

dimensions: celltypes x folds x n_subsamples x n genes sampled
Ill implement this for starters, but am not super happy with it.. It would be cool to brainstorm on why this might be interesting to the user to beginn with.

Zethson · 2021-12-26T21:16:33Z

I guess that this should help determine which genes contributed the most to the cell type prioritization?

Zethson · 2021-12-28T16:06:48Z

@antschum Laptop working again? ^_^
Should I have another look?

antschum · 2021-12-28T16:18:17Z

@antschum Laptop working again? ^_^

Hey:) using my brothers, should have a working one by Thursday though.

Should I have another look?

That would be great! Ill click on request.
Still missing feature importances for logistic regression (started reading, I think it sounds more complicated than it is, but don't fully understand it yet. There is not a lot online) and the concordance correlation coefficient for regression models, which I just have to implement.

Zethson · 2021-12-28T16:32:19Z

@antschum think you missed two comments of mine. (You can scroll through the changed files.) But besides this it looks great! Feel free to merge it after you've addressed my comments.

Regarding the feature importances:

# logistic regression for feature importance
from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
from matplotlib import pyplot
# define dataset
X, y = make_classification(n_samples=1000, n_features=10, n_informative=5, n_redundant=5, random_state=1)
# define the model
model = LogisticRegression()
# fit the model
model.fit(X, y)
# get importance
importance = model.coef_[0]
# summarize feature importance
for i,v in enumerate(importance):
	print('Feature: %0d, Score: %.5f' % (i,v))
# plot feature importance
pyplot.bar([x for x in range(len(importance))], importance)
pyplot.show()

We can just make use of something like this, no?

Also, we will need to implement some plots for the feature importances. Would you mind creating an issue for this?

antschum · 2021-12-31T11:00:12Z

@antschum think you missed two comments of mine. (You can scroll through the changed files.) But besides this it looks great! Feel free to merge it after you've addressed my comments.

Ah! Im sorry, I totally missed those. I changed the variable name and replied to the other comment.

Regarding the feature importances:

# logistic regression for feature importance
from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
from matplotlib import pyplot
# define dataset
X, y = make_classification(n_samples=1000, n_features=10, n_informative=5, n_redundant=5, random_state=1)
# define the model
model = LogisticRegression()
# fit the model
model.fit(X, y)
# get importance
importance = model.coef_[0]
# summarize feature importance
for i,v in enumerate(importance):
	print('Feature: %0d, Score: %.5f' % (i,v))
# plot feature importance
pyplot.bar([x for x in range(len(importance))], importance)
pyplot.show()

We can just make use of something like this, no?

So would you suggest just using the coefficient values as feature importances for the logistic regression? In the original they referenced the agresti method here. I have not really found much on this, although in the blog post, it seems they are just subtracting the mean from each coefficient and then dividing by the standard deviation, right?

Also, we will need to implement some plots for the feature importances. Would you mind creating an issue for this?

Done! #42

Zethson · 2021-12-31T11:20:12Z

Ah, wasn't aware of this. Yes, I read the blog the same way.
Basically, this part:

An alternative comparison of effects of quantitative predictors having different units uses standardized coefficients. The model is fitted to standardized predictors, replacing each xj by (xj−x¯j)/sxj. A 1-unit change in the standardized predictor is a standard deviation change in the original predictor. Then, each regression coefficient represents the effect of a standard deviation change in a predictor, controlling for the other variables. The standardized estimate for predictor xj is the unstandardized estimate β^j multiplied by sxj.

Feel free to merge it now :)
I'd suggest to implement the other things in follow up PRs.

antschum · 2021-12-31T12:39:47Z

Ah, wasn't aware of this. Yes, I read the blog the same way. Basically, this part:

An alternative comparison of effects of quantitative predictors having different units uses standardized coefficients. The model is fitted to standardized predictors, replacing each xj by (xj−x¯j)/sxj. A 1-unit change in the standardized predictor is a standard deviation change in the original predictor. Then, each regression coefficient represents the effect of a standard deviation change in a predictor, controlling for the other variables. The standardized estimate for predictor xj is the unstandardized estimate β^j multiplied by sxj.

Feel free to merge it now :) I'd suggest to implement the other things in follow up PRs.

Ah, I just read this and had already pushed - maybe you could take a quick look? I added some rough tests for the concordance correlation coefficient, so it should be more or less okay. And then we merge? My bad.
Doing the other things in followup PRs :)

antschum · 2021-12-31T12:46:08Z

Forgot to run nox. 🤦‍♀️

Zethson

LGTM!

tests/test_cv.py

calculate auc, test files, first finish

5f3c925

antschum changed the base branch from main to development December 14, 2021 16:39

nox

237d6f0

github-actions bot added the enhancement New feature or request label Dec 14, 2021

antschum added 5 commits December 15, 2021 10:19

remove print statements

01d2b1f

add summary metrics to results dict

8594624

tests

b57228d

fix tests, add random_state

da35d11

fix typeguard error

a8cc02c

antschum requested a review from Zethson December 23, 2021 14:25

Zethson reviewed Dec 24, 2021

View reviewed changes

antschum and others added 2 commits December 25, 2021 09:15

Update augurpy/cv.py

8d3d862

Co-authored-by: Lukas Heumos <lukas.heumos@posteo.net>

Update augurpy/cv.py

4977883

Co-authored-by: Lukas Heumos <lukas.heumos@posteo.net>

standardize random seed, update test, refactor docstrings

f634f79

add feature importances without logistic regression implementation

3e7e345

antschum requested a review from Zethson December 28, 2021 16:18

refactor variable names in average_metrics

c3554bb

antschum added 2 commits December 31, 2021 12:36

add simple progressbar

18c89f4

add concordance correlation coefficient and tests

ca4d2a8

Zethson approved these changes Dec 31, 2021

View reviewed changes

tests/test_cv.py Outdated Show resolved Hide resolved

nox and logistic regression random state

77864b8

antschum merged commit 09abf78 into development Dec 31, 2021

Zethson deleted the feature/calculateauc branch February 21, 2022 11:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/calculateauc #35

Feature/calculateauc #35

antschum commented Dec 14, 2021 •

edited

Loading

github-actions bot commented Dec 14, 2021

Zethson left a comment

Zethson commented Dec 24, 2021

antschum commented Dec 26, 2021 •

edited

Loading

Zethson commented Dec 26, 2021

Zethson commented Dec 28, 2021

antschum commented Dec 28, 2021 •

edited

Loading

Zethson commented Dec 28, 2021

antschum commented Dec 31, 2021

Zethson commented Dec 31, 2021

antschum commented Dec 31, 2021

antschum commented Dec 31, 2021

Zethson left a comment

Feature/calculateauc #35

Feature/calculateauc #35

Conversation

antschum commented Dec 14, 2021 • edited Loading

github-actions bot commented Dec 14, 2021

Zethson left a comment

Choose a reason for hiding this comment

Zethson commented Dec 24, 2021

antschum commented Dec 26, 2021 • edited Loading

Zethson commented Dec 26, 2021

Zethson commented Dec 28, 2021

antschum commented Dec 28, 2021 • edited Loading

Zethson commented Dec 28, 2021

antschum commented Dec 31, 2021

Zethson commented Dec 31, 2021

antschum commented Dec 31, 2021

antschum commented Dec 31, 2021

Zethson left a comment

Choose a reason for hiding this comment

antschum commented Dec 14, 2021 •

edited

Loading

antschum commented Dec 26, 2021 •

edited

Loading

antschum commented Dec 28, 2021 •

edited

Loading