Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[r] Add feature selection methods by variance, dispersion, and mean accessibility #169

Open
wants to merge 1 commit into
base: ia/normalizations
Choose a base branch
from

Conversation

immanuelazn
Copy link
Collaborator

@immanuelazn immanuelazn commented Dec 15, 2024

Details

Create functions to do feature selection, as a foundation for LSI and iterative LSI. These take in the number of features, and an optional function to be passed in for noramlization (if function uses variance or dispersion). The end result is a tibble with columns names, score, and highly_variable.

Tests

Since the interfaces are very similar, I just decided to throw all of them in a loop and test that the tibbles are formed as we expect. I don't know whether it would make sense to test for whether the actual feature selection logic makes sense, because that would just be re-doing the operations on a dgCMatrix. Otherwise, do you have test ideas with better signal on whether these methods perform as we expect?

Notes

I have this merging to the normalization branch, but I just do this to allow for the normalization logic to work within feature selection. I think it would make sense for merging normalizations into main once that is approved, then setting the head to main.

I think the underlying logic is essentially the same between each feature selection method, so I am leaning closer and closer to just putting all of the logic into a single function with a enum param for usage of a specific feature selection method. However, this might clash with LSI/iterative LSI unless we are okay with putting a purrr::partial() eval statement directly in the default args. That is, until we develop the option to do implicit partials.

@immanuelazn immanuelazn changed the base branch from main to ia/normalizations December 15, 2024 03:32
@immanuelazn immanuelazn changed the title Ia/feature selection [r] Add feature selection methods by variance, dispersion, and mean accessibility Dec 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant