-
Notifications
You must be signed in to change notification settings - Fork 103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix and refactor filter_genes
#537
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Adds function to calculate the element-wise product of two arrays or an array and a sparse matrix.
The original implementation fails when `adata.layers['unspliced']` is sparse but `adata.layers['spliced']` dense. The element-wise multiplication is now performend using `multiply`.
WeilerP
added a commit
that referenced
this pull request
Oct 14, 2022
* Add .editorconfig (#529) * Add TestCleanObsNames (#530) Add test class for `scvelo/core/_anndata.py::clean_obs_names`. * Refactor `clean_obs_names` (#532) * Rename input arguments from `data` to `adata`, `copy` to `inplace`, `ID_length` to `id_length`, `base` to `alphabet`. * Rename variables to something more meaningful and use snake as well as lower case, consistently. * Make use of Pandas functionality. * Use regex expression. * Move unit tests to `tests/` (#535) * Move `scvelo/core/tests` to `tests/core` Moves unit tests `scvelo/core/tests` to `tests/core`. This will, for example, prevent installation files becoming large when including data or ground truth figures. * Add `tests/__init__.py` Makes `tests/` a package. This is needed to e.g. import from `tests/core` later on. * Update pytest.ini Remove `scvelo` from testpaths. * Fix and refactor `filter_genes` (#537) * Add `multiply` Adds function to calculate the element-wise product of two arrays or an array and a sparse matrix. * Fix and refactor `filter_genes` The original implementation fails when `adata.layers['unspliced']` is sparse but `adata.layers['spliced']` dense. The element-wise multiplication is now performend using `multiply`. * Add column generation for adata.obs/.var (#544) * Fix and update docstrings Update docstrings to follow codebase style. * Add option to add columns to adata.obs * Adds `obs_col_names`, `min_obs_cols`, `max_obs_cols` to composite strategy `get_adata`. Using `obs_col_names`, the column names can be set manually, the other arguments allow setting the minimum and maximum number of columns generated. * Updates unit tests to encounter for and test changes. * Add option to add columns to adata.var * Adds `var_col_names`, `min_var_cols`, `max_var_cols` to composite strategy `get_adata`. Using `var_col_names`, the column names can be set manually, the other arguments allow setting the minimum and maximum number of columns generated. * Updates unit tests to encounter for and test changes. * Add unit tests for `cleanup` (#548) * Add method `_subset_columns` to `TestBase` Adds method to sample from column names in `adata.obs` and `adata.var`. * Add `TestCleanup` Adds test class to unit test `scvelo/core/_anndata.py::cleanup`. * Fix TestCleanup::test_cleanup_some (#550) * Refactor `cleanup` (#552) * Rename argument `data` to `adata` * Rename argument `copy` to `inplace` * Update type hint in `cleanup` * Rename variables Rename variables to use more informative names. * Add `TestGetInitialSize` (#554) * Add `TestGetSize` (#556) * Refactor module import (#560) * Delete `pl.py`, `pp.py` and `tl.py`. * Update `__init__.py` to no longer rely on `pl.py`, `pp.py` and `tl.py`. * Update `get_modality` (#567) * Update `get_modality` Support passing `None` for argument `modality`. In this case, `adata.X` is returned. * Add `TestGetModality::test_modality_equals_none` Add unit test to check that correct modality is extracted if `None` is passed for `modality` in `get_modality`. * Refactor and generalize `get_size` (#569) * Rename `layer` to `modality` Rename argument `layer` to `modality` in `get_size`. This unifies naming convention for the argument across the code base and prepares generalizing `get_size` to arbitrary modalities. * Generalize `get_size` to arbitrary modalities This allows calculating the size per observation for any modality, from `adata.X`, `adata.layers` or `adata.obsm`. * Update docstrings * Add TestObsDf (#579) Unit tests `scvelo.core._anndata.py::obs_df`. * Add TestVarDf (#583) Unit tests `scvelo.core._anndata.py::var_df`. * Update TestObsDf (#585) The addition assert statement checks that the index of the returned data frame is correct. * Fix typo in default layer names (#587) * Add `TestShowProportions` (#589) Unit tests `scvelo.core._anndata.py`::show_proportions`. * Add `TestSetInitialSize` (#591) * Reduce maximum number of obs/vars (#593) Reduce maximum size of generated AnnData objects in some tests. * Update ci.yml (#546) * Run tests on ubuntu-latest using both Python 3.7 and 3.8. * Increase verbosity of pytest. * Add timeout to tests. * Update `.gitignore` [ci skip] (#595) Add files and directories created by `pytest-cov`. * Update `_anndata.py` (#596) Change order of default layers to match temporal ordering in biological process. * Fix TestShowProportions (#598) In TestShowProportions::test_layers_not_specified, layers should not be specified but `layers=None` passed instead. * Update `TestCleanObsNames` (#600) Check that `inplace` argument is working correctly. * Test modality passed as string (#602) Add unit tests to `TestMakeDense` and `TestMakeSparse` when `modality` is passed as a string. * Fix `test_not_existing_modality` (#605) Generate layer names with at least two characters to exclude the case `layer='X'`. * Refactor unit test for `l2_norm` (#607) Converts unit test to a test class. 2D arrays and sparse matrices are now generated as well to increase coverage to 100%. * Add `varm` to AnnData strategy (#609) * Add `varm` to AnnData strategy Strategy can now also add entries to `adata.varm`. This is tested by extending the existing unit tests. * Set `max_obs` and `max_vars` to `5` This is an attempt to make the unit tests not in the CI on GitHub due to a time out error by Hypothesis. * Add `TestMerge` (#611) Add test class to unit test `scvelo/core/_anndata.py::merge`. * Update `TestVarDf` (#615) Reduce maximum size of generated AnnData. * Catch error in steady state calculation (#614) * Fix steady state calculation Raise ValueError when `beta` and `gamma` are not positive. * Update `TestSplicingDynamics` Add unit test to verify ValueError is raised when calculating steady states with inadmissible parameter values. * Fix AnnData strategy (#616) Reset `min_varm` and `max_varm` if `varm_keys` is specified. * Update `TestSplicingDynamics` (#618) Add unit tests to * test 2d time points, * verify correct initial states are set/returned, * verify correct steady states are returned. * Fix `TestCleanup` (#620) * Update `DynamicsBase` (#622) Remove redundant return statements to increase test coverage. The abstract methods cannot be tested. * Add `TestGetDf` (#624) Unit tests for `scvelo/core/_anndata.py::get_df`. * Add module `datasets` (#636) * Add module `datasets` The file `datasets.py` is moved to the module and renamed to `_datasets.py`. * Deprecate `pancreatic_endocrinogenesis` * Add `datasets/_simulate.py` Adds file `datasets/_simulate.py` and moves `simulation`, `unspliced`, `spliced` and `vectorize` from `datasets/_datasets.py` to the new file. * Cleanup `datasets/_datasets.py` (#638) * Sort functions alphabetically * Add TODO * Fix `DeprecationWarning` [ci skip] (#640) * Add argument `file_path` (#642) Allow saving/reading datasets to/from custom paths. * Fix gillespie simulation (#644) * Add unit tests for `scvelo.datasets` (#646) * Add `test_datasets.py` Add unit tests for datasets accessible through scvelo. * Add `test_simulate.py` Add unit tests for `scvelo.datasets.simulation`. * Add new datasets to `datasets/__init__.py` (#677) * Fix `csr_vcorrcoef` (#679) * Fix `csr_vcorrcoef` Fixes calculation of Pearson's correlation in `csr_vcorrcoef`. * Add `tests/preprocessing` * Add `TestCsrVcorrcoef` Test class for unit testing `csr_vcorrcoef`. * Update Python versions in CI (#680) * Run linting check in Python 3.8 * Run unit tests in Python 3.9 * Fix and unit test `get_mean_var` (#698) * Fix passing `perc` not affecting result Up to now, passing `perc` does not affect the output as only `data` but not `X` is manipulated. * Fix different size parameters Mean and variance calculation use different size parameters instead of one and the same. * Fix sparse input with `ignore_zeros` or nan values Mean and variance cannot be calculated due to a shape mismatch. * Add `TestGetMeanVar` Unit test `test_get_mean_var`. * Add unit tests for `preprocessing/utils.py` (#700) * Add `TestCheckIfValidDtype` Unit tests `check_if_valid_dtype`. * Add `TestFilter` Unit tests `_filter`. * Add `TestLog1p` Unit tests `log1p`. * Add `TestMaterializeAsNdarray` Unit tests `materialize_as_ndarray`. * Add `TestNotYetNormalized` Unit tests `not_yet_normalized`. * Add `TestFilterGenes` Unit tests `filter_genes`. * Add `TestCountsPerCellQuantile` * Add `TestNormalizePerCell` * Add `tests/_data/` Add subsetted datasets to use in unit tests. * Add `confest.py` Add file to declare fixture for entire testing directory. * Add `TestFilterGenesDispersion` Unit test `filter_genes_dispersion`. * Add `TestFilterAndNormalize` Unit test `filter_and_normalize`. * Add `TestRecipeVelocity` Unit test `recipe_velocity`. * Update AnnData generation for testing (#785) * Update `test_base.py::get_adata` Exclude `"X"` as key of `.layers`, `.obsm` and `.varm`. * Update `test_base.py::TestAdataGeneration` Check that `"X"` is not a key in `.layers`, `.obsm`, `.varm`. * Update `test_anndata.py` Remove redundant statements in `TestCleanup::test_cleanup_some` and `TestSetInitialSize::test_added_columns` according to changes in AnnData generation for tests. * Add `pull_request_template.md` [ci skip] (#787) Add template for pull requests. * Speed up CI (#789) * Update `ci.yml` Remove testing of dataset download from job `test`. Instead, a new job `test-dataset-downloads` is added. * Update `ci.yml` Print execution time of 25 slowest tests. * Update `test_datasets.py` Skip `test_datasets.py::TestDataSets` if Python version is not 3.8 and OS is not Linux. * Refactor `neighbors` (#795) * Split body of `neighbors` over several functions Split body of `neighbors` over several function for easier unit testing. * Add type hints to arguments of `neighbors` * Fix typo in docstrings * Fix `_get_rep` (#797) The variable `rep` is referenced before assignment if the if-else case is not entered. This is fixed by adding a final `else` case. In the following, only the variable `rep` is used. * Refactor testing in CI (#801) * Update `requirements-dev.txt` Add `pytest-cov` to the installed libraries. * Update `ci.yml` Update how dependencies are installed in the `test` job. * Increase Hypothesis deadline in CI (#804) To prevent test failure due to exceeding deadlines, the default deadline of Hypothesis is increased to 500 milliseconds when running the tests in the CI. * Unit test `neighbors.py` (#799) * Add `test_neighbors.py` Add file to host unit tests for `preprocessing/neighbors.py`. * Add `TestGetNeighs` * Add `TestGetNNeighs` * Add `TestGetDuplicateCells` * Add `TestRemoveDuplicateCells` * Add preprocessed adata for testing Add preprocessed versions of `dentategyrus_50obs.h5ad`, `dentategyurs_100obs.h5ad`, `pancreas_50obs.h5ad`, and `pancreas_100obs.h5ad`. * Add fixtures for preprocessed datasets Add fixtures to use preprocessed datasets in unit tests. * Add fixture `adata` The fixture `adata` is used to easily load raw or preprocessed datasets in unit tests. This allows running one and the same unit test on all available datasets. * Add `TestSelectConnectivities` Add unit tests for `select_connectivities`. * Add `TestSelectDistances` Add unit tests for `select_distances`. * Add `TestSetDiagonal` Add unit tests for `set_diagonal`. * Add `TestNeighborsToBeRecomputed` Add unit tests for `neighbors_to_be_recomputed`. * Add `TestVerifyNeighbors` Add unit tests for `verify_neighbors`. * Add `TestGetConnectivities` and accompanying files Add unit tests for `get_connectivities` and saved CSR matrices to verify results. * Add `TestGetCsrFromIndices` Add unit tests for `get_csr_from_indices`. * Add `TestComputeConnectivitiesUmap` Add unit tests for `compute_connectivities_umap` * Add `TestGetRep` Add unit tests for `_get_rep`. * Add `TestSetPCA` Add unit tests for `_set_pca`. * Add `TestGetHnswNeighbors` Add unit tests for `_get_hnsw_neighbors` and corresponding CSR matrices saved using `.npz` format. * Add `TestGetScanpyNeighbors` Add unit tests for `_get_scanpy_neighbors` and corresponding CSR matrices saved using `.npz` format. * Add `TestGetSklearnNeighbors` Add unit tests for `_get_sklearn_neighbors` and corresponding CSR matrices saved using `.npz` format. * Add `TestNeighbors` Adds unit tests for the function `neighbors`. * Unit test `moments.py` (#806) * Add `test_moments.py` Add file to host unit tests of functions in `moments.py`. * Add `TestGetMoments` Add unit tests for `get_moments` and corresponding ground truth moments saved using `.npy` format. * Add `TestSecondOrderMomentsU` Add unit tests for `second_order_moments_u`. * Add `TestSecondOrderMoments` Add unit tests for `second_order_moments` and corresponding moments saved using `.npy` format. * Add `TestMagicImpute` Add unit tests for `magic_imput` and corresponding imputed data saved using `.npy` format. * Add `TestMoments` Add unit tests for `moments` and corresponding first order moments saved using `.npy` format. * Update `requirements-dev.txt` Add `magic-impute` to run unit tests on CI. * Update `docs/` [ci skip] (#810) Recommend `kb count` instead of `loompy fromfq`. * Update `requirements.txt` (#838) Excludes `pandas==1.4.0` to prevent failure as reported in #811. * Fix `optimization.py::get_weight` (#839) Fix `scvelo.tools.optimization.py::get_weight` to work correctly when `perc` is numeric. * Fix inference with `fit_scaling=False` (#848) * Update `DynamicsRecovery` Fix parameter inference to not fit scaling when `fit_scaling=False`. * Update `pre-commit` setup (#855) * Update `requirements-dev.txt` Update versions of `black` and `isort`. * Update `.pre-commit-config.yaml` Update repo paths and versions. * Run `black` Run code formatting with latest version of `black`. * Add argument `file_path` (#853) Adds argument `file_path` to datasets added in `scvelo=0.2.4.`. * Update `CONTRIBUTING.rst` [ci skip] (#911) Details workaround to issue where `pip install -e '.[dev]'` leads to errors on Windows. * Fix saving stream embedding figure (#900) * Fix missing write_key for embedding stream plot. * Set save instead of wire_key when saving embedding stream plot. * Fix `clean_obs_names` (#921) * Fix bug with clean_obs_names Switched implementation to use the previously ignored id_length parameter. New approach finds a substring of length id_length where each of the letters are in the alphabet * Extend tests to cover more cases for clean_obs_names * Fix Pandas display precision (#907) * Update `core/_anndata.py` Update how Pandas displayed precision is set. * Update `core/_anndata.py` Add TODO comment to add a unit test to test the `precision` argument. * Fix documentation issues [ci skip] (#929) * Update `CONTRIBUTING.rst` [ci skip] Update branch name convention. * Update `docs/requirements.txt` [ci skip] Pin `Jinja` version. * Update `docs/source/conf.py` [ci skip] Update `add_stylesheet` with `add_css_file`. The former is deprecated. * Fix typo in `index.rst` [ci skip] * Fix link to bone marrow dataset [ci skip] * Fix `velocity_embedding` docs [ci skip] * Add key contributors section [ci skip] (#930) * Add `_key_contributors.rst` [ci skip] Add file with key contributors and their roles. * Show key contributors on index page [ci skip] * Fix deprecation of keyword argument `copy` (#932) * Add `core/_utils.py` * Add `core/_utils.py::deprecated_arg_names` Modify wrapper from Scanpy to replace old with new keyword argument. The wrapper is modified s.t. `copy=True` corresponds to `inplace=False`, and vice versa. * Make `deprecated_arg_names` importable from `core` * Update import of `deprecated_arg_names` Import local version instead of the Scanpy version s.t. the keyword argument `copy` is handled correctly. * Update `CONTRIBUTING.rst` [ci skip] (#938) Remove reference to `develop` branch as it will be removed in the next release. * Fix linear regression unit tests (#940) * Fix `TestLinearRegression::test_perfect_fit` Replace `array` strategy to generate `x` with `sampled_from`. * Fix `TestLinearRegression::test_perfect_fit_2d` Replace `array` strategy to generate `x` with `sampled_from`. * Update `test_neighbors.py` (#941) Remove failing and non-essential unit tests. See #922. * Update release notes [ci skip] (#945) * Remove trailing white spaces * Add notes for `0.2.5` release Co-authored-by: Isaac Virshup <ivirshup@gmail.com> Co-authored-by: Oisin-M <60450429+Oisin-M@users.noreply.github.com> Co-authored-by: Dries Schaumont <5946712+DriesSchaumont@users.noreply.github.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Bug fixes
filter_genes
now works withadata.layers['unspliced']
being sparse andadata.layers['spliced']
dense.New
multiply
for element-wise multiplication of arrays and sparse matricesChanges
filter_genes
to make use ofmultiply
and fixfilter_genes
fails whenadata.layers['unspliced']
is sparse andadata.layers['spliced']
dense #533.Related issues
Closes #533.