scvelo invert direction issue #112

temp-abandon · 2019-11-19T08:53:04Z

After using scvelo on our data, I got the reverse direction trajectory compared to the other trajectory methods. Does it make sense to use -1 * (speed vector)?

VolkerBergen · 2019-11-19T09:01:12Z

That usually traces back to something went wrong in one or the other pipeline. While models are conceptually different, they should not result in fundamentally different directionality. What exactly did you compare? Dynamical vs. steady state? or scvelo vs. velocyto? Would be keen to get some figures + details here or via email (feedback@scvelo.org).

jenzopr · 2019-11-19T09:18:34Z

I have the same issue. My (biologist) collaborator notified me that directions are inverted, compared to what he expects from his expert knowledge..

What kind of details do you need? The adata object, details on the preprocessing, etc.?

VolkerBergen · 2019-11-19T10:00:52Z

@jenzopr If you get these result with all models

scv.tl.velocity(adata, mode='steady_state')
scv.tl.velocity(adata, mode='stochastic')
scv.tl.velocity(adata, mode='dynamical')

the directionality seems to be clearly supported by the splicing kinetics. Then you would want to find out how strong the splicing information is and whether the directionality is supported by marker genes; obtained by looking into phase portraits scv.pl.scatter(adata, basis=[list_of_genes]).

If you can share your file, that would speedup finding what triggers the directionality obtained in your data.

jenzopr · 2019-11-19T10:12:23Z

Nice, I will check all models and come back to you. Thanks @VolkerBergen

reginawongg · 2019-11-20T06:27:07Z

I'm also facing the same issue, where the directions are reversed based on what we expect from knowing the biology.

KaiyangZ · 2019-11-20T08:45:57Z

I also got the same issue, in some cases, I got the reverse direction in dynamic mode compared to stochastic mode:

scv.tl.velocity(adata, mode='stochastic')
scv.tl.velocity_graph(adata)
scv.tl.umap(adata)
scv.pl.velocity_embedding_stream(adata, basis='umap', color=['sample'])

and

scv.tl.recover_dynamics(adata, max_iter=200, return_model=True)
scv.tl.velocity(adata, mode='dynamical')
scv.tl.velocity_graph(adata)
scv.pl.velocity_embedding_stream(adata, basis='umap', color=['sample'])

VolkerBergen · 2019-11-20T09:54:46Z

Could you let me know the number of velocity_genes being used (which could be problem with a rather low number of cells), and show the phase portrait of the top likelihood genes

print(adata.var['velocity_genes'].sum(), adata.n_vars)
top_genes = adata.var_names[adata.var.fit_likelihood.argsort()[::-1]]
scv.pl.scatter(adata, basis=top_genes[:10], ncols=5)

KaiyangZ · 2019-11-20T11:12:26Z

Hi,

Thanks for your prompt reply! there are 417 velocity_genes and only 96 cells, is the number of cells too low for the analysis?
print(adata.var['velocity_genes'].sum(), adata.n_vars, adata.n_obs)
417 3000 96
The phase portrait of the top 10 likelihood genes:

jenzopr · 2019-11-20T13:04:01Z

@jenzopr If you get these result with all models
scv.tl.velocity(adata, mode='steady_state')
scv.tl.velocity(adata, mode='stochastic')
scv.tl.velocity(adata, mode='dynamical')
the directionality seems to be clearly supported by the splicing kinetics.

Hi Volker,
all models result in the same directionality. When looking into phase portraits, I noticed that there are only 6 'velocity_genes'. Here are their phase portraits:

Is the number of velocity genes too low? What do you think?

VolkerBergen · 2019-11-20T13:18:28Z

@KaiyangZ Thanks for sharing. It's good to eventually see a use case where the dyn. model does not appear to be feasible. In most of your genes the dyn. model induces some state assignment (apparently mostly assigning to repression phases), which is not confidently assigned as it could equally well be in induction phase. That is due to the rather low cell numbers.
I would have to incorporate that measure of 'state-confidence' (which is not directly reflected in the likelihoods) in the model or couple gene fits. For now, I must admit that you would have to apply the stochastic model which is insensitive to cell numbers and the mentioned drawbacks. I'll keep you up to date when we update the dyn. model to also account for some kind of confidence of overall state-assignments.
[update for other readers: the steady-state ratios obtained from the stoch. and dyn. model should not deviate from each other as much and as systematically as in this case. That can indeed happen, if cell numbers are very low. If that is the case, please have a highly critical view on the resulting velocities and be cautious with interpretations.]

@jenzopr Only having 6 genes left is definitely too low. I'll add some exception to throw an error in that case. Gene are selected by an R-squared (or likelihood) threshold. You can adjust the threshold, e.g. scv.tl.velocity(adata, mode='steady_state', min_r2=1e-3). But I'd definitely have a closer look why you get such low R-squared values. After all, the splicing kinetics doesn't seem to be very informative in your case. Do you expect any dynamics at all?

jenzopr · 2019-11-20T13:45:19Z

@VolkerBergen thanks for your input. A warning message might be a good idea.

Yes we expect a lot of dynamics, but they might be hidden since we sampled cells from multiple timepoints. How would scvelo deal with cell populations that expand in the beginning (e.g. inflammatory processes), but return to normal at later stages? Splicing kinetics would be pointing in the opposite direction or form a circle.. Along this line, I tried to use the groups and groupby arguments to estimate kinetics for subsets of cells only. Is this intended usage?

KaiyangZ · 2019-11-20T14:21:36Z

Thanks a lot @VolkerBergen, I will use stoch. model, but how about the latent time then? if I understand correctly, it is estimated from the dyn. model, and what is the difference between velocity_pseudotime and latent time? Some explanations are really appreciate!

VolkerBergen · 2019-11-20T17:07:26Z

@KaiyangZ latent time is only available through the dyn. model. I'll provide explanations on latent time vs velocity pseudotime etc. in the docs within the next few days.

@jenzopr yes, groups and groupby arguments are intended to be used to estimate kinetics within subset of cells, e.g. per batch. I'll get back to you for the other question.

jonathan-f · 2019-11-29T08:51:35Z

Hi @VolkerBergen , I'm also facing an inversion of the direction of the velocities in the different models: I ran steady state, stochastic and dynamical model (with default parameters) on the same data and there seems to be a general agreement between the steady state and the dynamical, while the stochastic has the velocities inverted for many cells.

I checked the velocity genes and they are exactly the same for steady state and stochastic (~400 genes) and they are a subset of the velocity genes that I get with the dynamical model.

I reasoned that the problem could reside in the steady state ratio estimate and, actually, looking at the scatter plots of the top 10 dynamics-driving genes (obtained from the dynamical model) in the 3 models (I'm showing one example below), the stochastic model gives a different estimate of the steady state ratio.

Is there a way to assess the reliability of the steady state ratio computation (e.g., checking the computation of second order moments for the stochastic model)? Maybe the number of neighbours used in the computation of the moments is critical in this case in which the number of cells is pretty small (~100)?

Sorry for the long message and thanks!

Hsu-Che-Wei · 2019-12-13T15:01:22Z

Hello @VolkerBergen, thanks for the cool tool. I also found the RNA velocity in my model system is perfectly inverted in every model (steady, stochastic, and dynamic), and I am sure that the number of cells are high enough, each sample I have has at least 2K cells. In combined, I have over 40K. Below, I show you the u vs s plots of some genes, the blue cells are actually "stem cells/meristematic cells", the yellow and orange ones are "differentiated" cells. Do you have any explanation for why the latent time has to start from the bottom left corner? Why can't it be started at the up right corner? Cause in my case, it seems that the development is driven by degradation of spliced transcripts or repression.

Also, do you know how to do the analysis in the case that the user have already the normalized counts value? You just skip the scv.pp.filter_and_normalize() step?

Thank you very much!

VolkerBergen · 2019-12-14T15:35:43Z

Thanks sharing these!

Latent time is rooted by the Markov diffusion process that is obtained from the velocity vector field; which for many genes is indeed in the origin; but it is not assumed to always start in there. In your cases it'd be rooted in the yellow clusters because it assigns the yellow-to-blue as induction phase. Here, it's hard for the dynamical model to pick up the right state assignment as there's hardly any curvature it can learn from. Maybe you could try decrease the number of neighbors in scv.pp.moments(adata, n_neighbors=10).

Overall it looks exactly like what you've already pointed out. It's mostly governed by degradation. Check out Suppl. Fig. 8b right phase portrait in the manuscript (https://doi.org/10.1101/820936); that nicely resembles the kinetics in your data. I'll see if I can quickly add some user-defined prior to account for predominant degradation.

VolkerBergen · 2019-12-14T15:40:07Z

scv.pp.filter_and_normalize does not touch your adata.X if it is preprocessed already. It then only goes after spliced and unspliced counts. If you set scv.settings.verbosity=3 (default) you should get some warning messages that it recognizes .X as being already preprocessed.

VolkerBergen · 2019-12-14T16:00:18Z

@jonathan-f Pardon, I've completely forgotten you. From only 100 cells is pretty hard to pick up any signal. However, if others genes display similar patterns like that one, splicing kinetics seems to support directionality from blue-to-orange.

Could you check how the stoch. model behaves as you decrease the number of neighbors in scv.pp.moments(adata, n_neighbors=10). I think 30 neighbors on 100 cells for second-oder moments is way too much. That default value was selected when testing against datasets of 3k - 35k cells.

jranek · 2019-12-19T20:53:15Z

Hi @VolkerBergen , thank you for a great set of tools! I'd love to join in on the conversation, for I'm facing a similar issue.

We differentiated stem cells into definitive endoderm, collecting three time points, (blue = d0, orange = d1, green = d2). Contrary to what we'd expect, the vector field is showing d2 cells primarily moving towards d1 and then d0. In addition, the vector field generated through the dynamical model is quite different from the steady-state approach.

The dimensionality after preprocessing was 4555 cells x 2000 genes.

**edit
I've decided to remove my previous plots to give a different visual of the same issue. Below are two plots of the top 10 likelihood genes, one colored according to sample and the other colored according to latent time. You'll notice that the d2 cells (green) have a latent time of ~0, whereas the d0 cells (blue) have a latent time of ~1.

MagpiePKU · 2020-09-11T05:35:25Z

Hi,

I would like to raise a similar issue in which the directionalities of velocity_stream and PAGA were inversed:

A very strange one.

grimwoo · 2020-10-30T05:01:37Z

@VolkerBergen Hi, I am new to python and scVelo. And I have the same issue: all three modes in scv.tl.velocity(adata, mode=...) gave same "wrong" direction (from differentiated to stem cell). What else could I do? Could scVelo simply "reverse/invert" the direction of arrows? Look forword to your reply. Thanks!

zhanglab2008 · 2020-11-04T13:55:45Z

Hi team, thanks for developing this awesome tool. But I also experienced the same inverted direction issue. From my study, I am pretty sure what the correct direction is because I collected time points data from healthy to disease. I tried all the three models steady, stochastic and dynamics but the directions are consistently wrong. I really hope you can fix the issue. Thanks!

Ivalyne · 2020-12-02T11:58:48Z

Thanks for the great toolkit.

As @VolkerBergen mentioned last year, we could get the top likelihood genes by using the code as below.

print(adata.var['velocity_genes'].sum(), adata.n_vars)
top_genes = adata.var_names[adata.var.fit_likelihood.argsort()[::-1]]
scv.pl.scatter(adata, basis=top_genes[:10], ncols=5)

However, in the latest version, there has some changes about this.
Nowadays, when we want to get the top likelihood genes, you can get the genes by using this code.

top_genes = adata.var['fit_likelihood'].sort_values(ascending=False).index

As you know, this code is written in the tutorials.

This comment is just for somebody who read this issue and stack at the step of retrieving the top genes, like me.

yunkaizhang · 2021-02-23T22:23:39Z

Hi,

I would like to raise a similar issue in which the directionalities of velocity_stream and PAGA were inversed:

A very strange one.

Hi there,
Wondering if you have fix this problem?
Can you also look into the latent_time or velocity_pseudotime for your dataset? Wondering these may be similar to PAGA (and inverted to velocity_stream) since PAGA are based on these.

WeilerP · 2021-05-30T07:43:03Z

@jranek, your phase portraits do not include any curvature s.t. the model cannot be fitted confidently. See e.g. #462 or #216.

WeilerP · 2021-05-30T07:46:14Z

@MagpiePKU, @yunkaizhang, @WT215 #456 was recently opened. Let's discuss the issue there since it is not related to the original issuer here.

paulitikka · 2022-02-10T13:31:25Z

Hi,

I have solved this. Use the Scvelo calculated UMAP and PCA values, and not the ones from coming from Seurat. Like: ...

#...after the import of data and the libraries:
scv.pp.filter_and_normalize(adata, min_shared_counts=20, n_top_genes=2000)
scv.pp.log1p(adata)
#this below line is important!! no old umaps and pcas
sc.pl.umap(adata, frameon=False, legend_loc='on data', title='', save='a10222_celltypes_ok.pdf')
sc.tl.pca(adata)
sc.pp.neighbors(adata, n_pcs=30, n_neighbors=30)
scv.pp.moments(adata, n_pcs=None, n_neighbors=None)

#And then...
scv.tl.recover_dynamics(adata, n_jobs=8) #separately... (!!) takes around 4 m: 15:15-18
scv.tl.velocity(adata, mode="dynamical") #separately as a oneliner, not in the 'cell'...
scv.tl.velocity_graph(adata) #separately... and the note in the next line the basis as 'umap' and not 'X_umap'
scv.pl.velocity_embedding_stream(adata, basis="umap", legend_fontsize=12, title="", smooth=0.8, min_mass=4, save='dyn_rankaas_tikka10222_vN.png')

B.r.,
Pauli
:)

stefanpeidli · 2022-02-15T13:33:26Z

This preprint from 2 days ago by Pachterlab mentions the issue of inverted velocities:
https://www.biorxiv.org/content/10.1101/2022.02.12.480214v1

mayurdoke6 · 2022-10-19T20:02:32Z

Hi,

I have solved this. Use the Scvelo calculated UMAP and PCA values, and not the ones from coming from Seurat. Like: ...

#...after the import of data and the libraries: scv.pp.filter_and_normalize(adata, min_shared_counts=20, n_top_genes=2000) scv.pp.log1p(adata) #this below line is important!! no old umaps and pcas sc.pl.umap(adata, frameon=False, legend_loc='on data', title='', save='a10222_celltypes_ok.pdf') sc.tl.pca(adata) sc.pp.neighbors(adata, n_pcs=30, n_neighbors=30) scv.pp.moments(adata, n_pcs=None, n_neighbors=None)

#And then... scv.tl.recover_dynamics(adata, n_jobs=8) #separately... (!!) takes around 4 m: 15:15-18 scv.tl.velocity(adata, mode="dynamical") #separately as a oneliner, not in the 'cell'... scv.tl.velocity_graph(adata) #separately... and the note in the next line the basis as 'umap' and not 'X_umap' scv.pl.velocity_embedding_stream(adata, basis="umap", legend_fontsize=12, title="", smooth=0.8, min_mass=4, save='dyn_rankaas_tikka10222_vN.png')

B.r., Pauli :)

Hello, @paulitikka Thank you for your input on this issue. I have tried to use your code in my analysis. It did not help me rectify my issue with the velocity embedding graph generated with dynamical scvelo mode analysis. It shows the opposite of the stochastic mode of scvelo. I find stochastic mode is better at analyzing RNA velocity than the dynamical, especially in my data. I may be wrong but I am not sure. I am a little bit perplexed to explain this issue to reviewers for publication. please let me know your view on this.

Regards,

MAD.

Matthew1309 · 2022-12-21T22:46:47Z

I just found a paper where the authors ran into the issue of inverted pseudotime directionality, identified culprit genes, and removed them in order to resolve the inversion. Hopefully this is relevant/helpful. Paper: Coordinated changes in gene expression kinetics underlie both mouse and human erythroid maturation.

WeilerP · 2022-12-22T08:25:25Z

I just found a paper where the authors ran into the issue of inverted pseudotime directionality, identified culprit genes, and removed them in order to resolve the inversion. Hopefully this is relevant/helpful. Paper: Coordinated changes in gene expression kinetics underlie both mouse and human erythroid maturation.

@Matthew1309, the issue is a conceptual one. As discussed in the paper, the genes exhibiting a transcriptional burst are highly relevant for the overall process. Thus, you cannot simply ignore them. The goal is not to make arrows look good on a UMAP but recover correct Biology, IMO. See also here.

stefanpeidli mentioned this issue Dec 21, 2019

Trying to set attribute .layers of view, making a copy. #122

Closed

royfrancis mentioned this issue Feb 2, 2020

scVelo arrows pointing in inverse direction? #134

Closed

zakiF mentioned this issue Feb 26, 2020

Specifying start / end cell (Specify root_cells and end_point) #162

Closed

WeilerP closed this as completed May 30, 2021

pabloatria18 mentioned this issue Jun 4, 2021

Problem with arrow direction [Question] #487

Closed

mayurdoke6 mentioned this issue Oct 19, 2022

Opposite direction shown by stochastic vs dynamical mode #946

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

scvelo invert direction issue #112

scvelo invert direction issue #112

temp-abandon commented Nov 19, 2019

VolkerBergen commented Nov 19, 2019

jenzopr commented Nov 19, 2019 •

edited

Loading

VolkerBergen commented Nov 19, 2019

jenzopr commented Nov 19, 2019

reginawongg commented Nov 20, 2019

KaiyangZ commented Nov 20, 2019

VolkerBergen commented Nov 20, 2019

KaiyangZ commented Nov 20, 2019

jenzopr commented Nov 20, 2019

VolkerBergen commented Nov 20, 2019 •

edited

Loading

jenzopr commented Nov 20, 2019

KaiyangZ commented Nov 20, 2019

VolkerBergen commented Nov 20, 2019

jonathan-f commented Nov 29, 2019

Hsu-Che-Wei commented Dec 13, 2019

VolkerBergen commented Dec 14, 2019

VolkerBergen commented Dec 14, 2019

VolkerBergen commented Dec 14, 2019 •

edited

Loading

jranek commented Dec 19, 2019 •

edited

Loading

MagpiePKU commented Sep 11, 2020

grimwoo commented Oct 30, 2020

zhanglab2008 commented Nov 4, 2020

Ivalyne commented Dec 2, 2020

yunkaizhang commented Feb 23, 2021

WeilerP commented May 30, 2021

WeilerP commented May 30, 2021

paulitikka commented Feb 10, 2022 •

edited

Loading

stefanpeidli commented Feb 15, 2022

mayurdoke6 commented Oct 19, 2022

Matthew1309 commented Dec 21, 2022

WeilerP commented Dec 22, 2022

scvelo invert direction issue #112

scvelo invert direction issue #112

Comments

temp-abandon commented Nov 19, 2019

VolkerBergen commented Nov 19, 2019

jenzopr commented Nov 19, 2019 • edited Loading

VolkerBergen commented Nov 19, 2019

jenzopr commented Nov 19, 2019

reginawongg commented Nov 20, 2019

KaiyangZ commented Nov 20, 2019

VolkerBergen commented Nov 20, 2019

KaiyangZ commented Nov 20, 2019

jenzopr commented Nov 20, 2019

VolkerBergen commented Nov 20, 2019 • edited Loading

jenzopr commented Nov 20, 2019

KaiyangZ commented Nov 20, 2019

VolkerBergen commented Nov 20, 2019

jonathan-f commented Nov 29, 2019

Hsu-Che-Wei commented Dec 13, 2019

VolkerBergen commented Dec 14, 2019

VolkerBergen commented Dec 14, 2019

VolkerBergen commented Dec 14, 2019 • edited Loading

jranek commented Dec 19, 2019 • edited Loading

MagpiePKU commented Sep 11, 2020

grimwoo commented Oct 30, 2020

zhanglab2008 commented Nov 4, 2020

Ivalyne commented Dec 2, 2020

yunkaizhang commented Feb 23, 2021

WeilerP commented May 30, 2021

WeilerP commented May 30, 2021

paulitikka commented Feb 10, 2022 • edited Loading

stefanpeidli commented Feb 15, 2022

mayurdoke6 commented Oct 19, 2022

Matthew1309 commented Dec 21, 2022

WeilerP commented Dec 22, 2022

jenzopr commented Nov 19, 2019 •

edited

Loading

VolkerBergen commented Nov 20, 2019 •

edited

Loading

VolkerBergen commented Dec 14, 2019 •

edited

Loading

jranek commented Dec 19, 2019 •

edited

Loading

paulitikka commented Feb 10, 2022 •

edited

Loading