Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Export.py for sparse pandas data frames[results] #278

Closed
cbravo93 opened this issue Mar 17, 2021 · 2 comments
Closed

Export.py for sparse pandas data frames[results] #278

cbravo93 opened this issue Mar 17, 2021 · 2 comments
Labels
results Question about pySCENIC results

Comments

@cbravo93
Copy link
Member

cbravo93 commented Mar 17, 2021

In export.py these lines cause an error when using a sparse data frame:

# Calculate the number of genes per cell.
binary_mtx = ex_mtx.copy()
binary_mtx[binary_mtx != 0] = 1.0
ngenes = binary_mtx.sum(axis=1).astype(int)

Something like this (not-tested) could be used for sparse pandas:

ngenes = ex_mtx.values().count_nonzero().astype(int)

The question is, is a dense matrix required at any point for creating the loom file? Otherwise changing these would be more efficient for very large data sets.

Cheers!

C

@cbravo93 cbravo93 added the results Question about pySCENIC results label Mar 17, 2021
@ghuls
Copy link
Member

ghuls commented Mar 17, 2021

The binarisation can also be optimized if the input is a sparse matrix.

cflerin added a commit that referenced this issue Apr 15, 2021
- Fix counting of genes per cell
- #278
@cflerin
Copy link
Contributor

cflerin commented May 28, 2021

Fixed with the 0.11.1 release.

@cflerin cflerin closed this as completed May 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
results Question about pySCENIC results
Projects
None yet
Development

No branches or pull requests

3 participants