Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

about gene for velocity #216

Closed
hahia opened this issue Jun 12, 2020 · 3 comments
Closed

about gene for velocity #216

hahia opened this issue Jun 12, 2020 · 3 comments
Labels
question Further information is requested

Comments

@hahia
Copy link

hahia commented Jun 12, 2020

This is a amazing package
I'm a new user of python and scVelo. I have some naive questions,
I have 2421 cells dropseq data and annotate spliced/unspliced with velocyto.

just capture 8% unspliced

An average of about 800 genes were detected of dropseq data.

I use scVelo to velocity analysis, use stochastic and dynamic model.

scv.pp.filter_and_normalize(adata, min_shared_counts=20, n_top_genes=500)
scv.pp.moments(adata, n_pcs=30, n_neighbors=30)
scv.tl.velocity(adata,mode='stochastic')
scv.tl.velocity_graph(adata)

print(adata.var.velocity_genes.sum())
74

scv.pl.velocity_embedding_stream(adata, basis='umap',color='cluster')
image

####dynamic model
scv.tl.recover_dynamics(adata)
scv.tl.velocity(adata, mode='dynamical')
scv.tl.velocity_graph(adata)

print(adata.var.velocity_genes.sum())
62

scv.pl.velocity_embedding_stream(adata, basis='umap',color='cluster')
image

top_genes = adata.var['fit_likelihood'].sort_values(ascending=False).index[:300]
scv.pl.scatter(adata, basis=top_genes[:10], frameon=False, ncols=5,color='cluster')

image

counts_s = scv.utils.sum_var(adata.layers['spliced'])
counts_u = scv.utils.sum_var(adata.layers['unspliced'])
fractions_u = counts_u / (counts_s + counts_u)
scv.pl.scatter(adata, color=fractions_u, smooth=True)

image

1.Is around 100 genes enough for infer cell next state?
2.there have diferent direction in cluster 6, 9, 3, which we care about. which one is right?

@hahia hahia added the question Further information is requested label Jun 12, 2020
@WeilerP
Copy link
Member

WeilerP commented May 25, 2021

1.Is around 100 genes enough for infer cell next state?

100 seems a bit low but the number of genes needed depends on the dataset, in general. As outlined in this supplementary material, you can see that already 30 genes explain a large portion of the dynamics in the dentate gyrus dataset.

2.there have diferent direction in cluster 6, 9, 3, which we care about. which one is right?

Based on the phase portraits of the genes with the highest likelihood it seems that your dataset does not provide enough information to infer dynamics. To be more concise: The phase portraits do not exhibit the characteristic almond/football shape. Take FOLH1 for example: An up- and down-regulation has been fitted. However, you could just as easily fit only an up- or down regulation since the phase portrait is linear.

@JohnGenome
Copy link

Hi @WeilerP , taking that phase portraits of many velocity genes are lack of information (perhaps linear or random), if it's a possiable solution to manully select the genes with characteristic almond/football pattern (via var.velocity_genes) and rerun the velocity_graph()?
It sounds like to introduce subjective bias, but the result direction is just as we expect.
I'll move to a new issue and show more information if you want.

@WeilerP
Copy link
Member

WeilerP commented Jul 27, 2022

@JohnGenome, yes that would be a solution. However, just be aware that observing (part of) a desirable phase portrait does not imply correct inference (w.r.t known biology) as model assumptions could still be violated. See e.g. this example where transcriptional bursts are observed (i.e. the assumption of approx. constant rates is violated).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants