Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sample pre-ordering vs ordering under the hood #44

Open
vkartha opened this issue Jan 14, 2024 · 9 comments
Open

Sample pre-ordering vs ordering under the hood #44

vkartha opened this issue Jan 14, 2024 · 9 comments

Comments

@vkartha
Copy link
Collaborator

vkartha commented Jan 14, 2024

Hello!! I actually wanted to test run this as part of trying something out in my own work, and I was specifically wondering looking at the newer implementation, how one can run the search separately using either the decreasing ranking of the input scoring or increasing ranking (testing for left-skew in either case). I think in the previous implementation we provided the desired ranking order for this reason, so was curious about the newer implementation in this regard.

I was wondering if we are now expected to pre-rank the input variable (sort(value,decreasing=TRUE/FALSE)) - based on the example run, the variable utils::data(TAZYAP_BRCA_ACTIVITY) doesn't seem to be pre-sorted.

meta_plot always plots the var in decreasing order, which is also why I was curious

Thanks, and very pleased to see this cleaned up so well!

@vkartha
Copy link
Collaborator Author

vkartha commented Jan 14, 2024

From a quick scan, it appears to be under ks_rowscore, line 96

# KS is a ranked-based method
  # So we need to sort input_score from highest to lowest values
  input_score <- sort(input_score, decreasing=TRUE)
  
  # Re-order the matrix based on the order of input_score
  FS <- FS[, names(input_score), drop=FALSE]  

I believe this will provide the same best meta-feature no matter how you rank and input the dataset then (appears so), since it's ranking it internally, and hence you can't test an alternative 'observed' ranking as is?

@tetomonti
Copy link
Collaborator

tetomonti commented Jan 14, 2024 via email

@vkartha
Copy link
Collaborator Author

vkartha commented Jan 14, 2024

Hi @tetomonti , I think that was what I was mentioning - regardless of the input vector (order) you get the same meta-feature, and I think it's because of that ranking command embedded under ks_rowscore where it always re-orders in descending order of the input metric (see my last comment), so I am not sure you can test different rankings?

@tetomonti
Copy link
Collaborator

tetomonti commented Jan 14, 2024 via email

@vkartha
Copy link
Collaborator Author

vkartha commented Jan 15, 2024

I see - so there's no way of telling it what order to use, and if you want to flip the sign of the search we have to artifically negate the values first? This would make sense given it always re-sorts the input vector in decreasing order under the hood, which I didn't realize it was doing initially.

Wouldn't it be better if we just assumed the input vector was already ordered as intended (i.e. don't re-order internally), that way the resulting score values and plots don't have to be negative (in the reverse case) / interpretation is easier?

@tetomonti
Copy link
Collaborator

tetomonti commented Jan 15, 2024 via email

@vkartha
Copy link
Collaborator Author

vkartha commented Jan 15, 2024

Makes sense, thanks @tetomonti. Still wondering though, why not (in the case of rank-based approaches) assume the input is pre-ranked so that values need not be altered at all? (similar to pre-ranked GSEA, etc.)? Basically skipping the re-ordering in desc order under the hood for ks, in which case it should also work just fine for non-rank based approaches as well.

Otherwise you have to go with this:

image

@tetomonti
Copy link
Collaborator

tetomonti commented Jan 16, 2024 via email

@vkartha
Copy link
Collaborator Author

vkartha commented Jan 16, 2024

In the plot above, the highest values are on the left, indeed, post-inversion (using -input_score means larger originally positive values are now more negative)? The point I was making is that you are then showing negative YAP/TAZ activities which don't really exist in the data (as that is what was passed to candidate_search), and metaplot will use whatever input scores were fed to candidate_search. Happy to chat offline if easier. All I did above was reverse the input score shown in the tutorial (yap/taz e.g.) and run the candidate search + metaplot commands. You can try it yourself, to confirm the output

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants