-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Flat file -> DataMatrix? #21
Comments
If I understand things correctly you actually want cell_counts = combine(groupby(tx,["cell","target"]), nrow=>"count") because the version above will give you duplicate lines. But we can also construct a cells = unique(tx.cell)
targets = unique(tx.target)
cell_ind = identity.(indexin(tx.cell, cells))
target_ind = identity.(indexin(tx.target, targets))
X = sparse(target_ind, cell_ind, 1)
counts = DataMatrix(X, DataFrame(id=targets, name=targets), DataFrame(cell_id=cells)) which let's It's a bit manual to do it this way, it would be nice if this was possible without the user having to explicitly work with indices. |
Do you think it would be worthwhile to add a utility function for this? accumulate_data_matrix(tx; obs_cols="cell", var_cols="target", obs_annot_cols=["fov", "cell_ID"]) or accumulate_data_matrix(tx; values="counts", obs_cols="cell", var_cols="target", obs_annot_cols=["fov", "cell_ID"]) if you have a column with counts. (A better name would be nice though. 🙂) |
Ooh, that is much cleaner 😅
Hmm - yeah, that looks great. Agree that there could be a nicer name - is there a way to abuse multiple dispatch on the |
In case you missed it on slack #appreciation - this approach worked great! (Tried to upload the gif here, but it's too big) I managed to get SVD and UMAP in about 10 sec, where it crashed my buddy's iMac when he tried to do it in Seurat 👍 |
That's great to hear. 😄 Yeah, I'm not to fond of (ab)using the |
Is there a description somewhere of how to create a
DataMatrix
from some other data type? The tutorial doesn't make this clear, it provides data that's already in the correct format.I have spatial-transcriptomics data that looks like this:
The table is ~42 million rows.
I can get counts / cell with
Though it takes a long time. Just wondering if there's an obvious way to get to the sparse matrix / DataMatrix format?
The text was updated successfully, but these errors were encountered: