Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UMAP looks like a line when neighborhood size was determined by using cell type labels #117

Open
Evenlyeven opened this issue Jul 3, 2023 · 5 comments

Comments

@Evenlyeven
Copy link

Thanks for the useful tool!

I noticed that in my results, some areas look like solid lines (for example the cluster at the top in the screenshot below) in the UMAP. I wonder if this is due to that SAM run was set to neighborhood size determined by using cell type labels provided by myself. Does this look normal to you?
image

And when I check the UMAPs before SAMap stitch them together, they both look "normal" to me.
sam1:
image

sam2:
image

Also, in my test run, where I didn't use cell type lablels to determine neighborhood size, hopping along each cell's outgoing edges was used instead. The UMAP looks more "normal" to me.
image

Any comments or suggestions will be highly appreciated!

The script I used is attached below (paths were replaced by ...):

from samap.mapping import SAMAP
from samap.analysis import (get_mapping_scores, GenePairFinder,
sankey_plot, chord_plot, CellTypeTriangles,
ParalogSubstitutions, FunctionalEnrichment,
convert_eggnog_to_homologs, GeneTriangles)
from samalg import SAM
import pandas as pd
import anndata
from joblib import dump, load

zf_data = anndata.read_h5ad('....')
pf_data = anndata.read_h5ad('....')

sam1 = SAM(counts = zf_data)
sam1.preprocess_data(filter_genes = False)
sam1.run(batch_key = 'orig.ident',
npcs = 30)

sam2 = SAM(counts = pf_data)
sam2.preprocess_data(filter_genes = False)
sam2.run(npcs = 20)

sams = {'zf': sam1, 'pf': sam2}

sm = SAMAP(sams,
keys = {'zf': 'cell_type', 'pf': 'cell_type'},
f_maps = '...',
save_processed = True)

Thanks very much in advance!

Di

@atarashansky
Copy link
Owner

Can you give me a sense of how large the cell type labels are? It would be great if you could show me the number of cells assigned to each label.

@Evenlyeven
Copy link
Author

Here's tables showing number of cells assigned to each label.

Species zf:
image

Species pf:
image

Another question is, would it be the best if the input cell number of different species are comparable? I am working with 200 cells of one species and 8,000 cells of another species, was thinking about downsampling the 8,000 one.

Thank you!!

@atarashansky
Copy link
Owner

atarashansky commented Jul 19, 2023

I think SAMap can be robust to dataset size disparities, but I would encourage you to try downsampling and check if the results change. I would also encourage changing the (poorly documented) NHS parameter in SAMAP.run like so:

NHS = {'small_dataset_id': 2, 'big_dataset_id': 3}

NHS controls neighborhood size. 3 means that a cell's neighborhood includes cells up to 3 edges away. 2 decreases the neighborhood size, which is probably good for smaller datasets.

@atarashansky
Copy link
Owner

Instead of using keys in SAMAP(...),

Can you try using neigh_from_keys in SAMAP.run(...)? You can pass it the same exact value as you're passing to keys.

If you use neigh_from_keys, then NHS is not needed.

@Evenlyeven
Copy link
Author

Thanks a lot for your suggestions, I will try it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants