-
Notifications
You must be signed in to change notification settings - Fork 105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
scvelo invert direction issue #112
Comments
That usually traces back to something went wrong in one or the other pipeline. While models are conceptually different, they should not result in fundamentally different directionality. What exactly did you compare? Dynamical vs. steady state? or scvelo vs. velocyto? Would be keen to get some figures + details here or via email (feedback@scvelo.org). |
I have the same issue. My (biologist) collaborator notified me that directions are inverted, compared to what he expects from his expert knowledge.. What kind of details do you need? The |
@jenzopr If you get these result with all models
the directionality seems to be clearly supported by the splicing kinetics. Then you would want to find out how strong the splicing information is and whether the directionality is supported by marker genes; obtained by looking into phase portraits If you can share your file, that would speedup finding what triggers the directionality obtained in your data. |
Nice, I will check all models and come back to you. Thanks @VolkerBergen |
I'm also facing the same issue, where the directions are reversed based on what we expect from knowing the biology. |
I also got the same issue, in some cases, I got the reverse direction in dynamic mode compared to stochastic mode:
and
|
Could you let me know the number of velocity_genes being used (which could be problem with a rather low number of cells), and show the phase portrait of the top likelihood genes
|
Hi Volker, Is the number of velocity genes too low? What do you think? |
@KaiyangZ Thanks for sharing. It's good to eventually see a use case where the dyn. model does not appear to be feasible. In most of your genes the dyn. model induces some state assignment (apparently mostly assigning to repression phases), which is not confidently assigned as it could equally well be in induction phase. That is due to the rather low cell numbers. @jenzopr Only having 6 genes left is definitely too low. I'll add some exception to throw an error in that case. Gene are selected by an R-squared (or likelihood) threshold. You can adjust the threshold, e.g. |
@VolkerBergen thanks for your input. A warning message might be a good idea. Yes we expect a lot of dynamics, but they might be hidden since we sampled cells from multiple timepoints. How would scvelo deal with cell populations that expand in the beginning (e.g. inflammatory processes), but return to normal at later stages? Splicing kinetics would be pointing in the opposite direction or form a circle.. Along this line, I tried to use the |
Thanks a lot @VolkerBergen, I will use stoch. model, but how about the latent time then? if I understand correctly, it is estimated from the dyn. model, and what is the difference between velocity_pseudotime and latent time? Some explanations are really appreciate! |
@KaiyangZ latent time is only available through the dyn. model. I'll provide explanations on latent time vs velocity pseudotime etc. in the docs within the next few days. @jenzopr yes, |
Hi @VolkerBergen , I'm also facing an inversion of the direction of the velocities in the different models: I ran steady state, stochastic and dynamical model (with default parameters) on the same data and there seems to be a general agreement between the steady state and the dynamical, while the stochastic has the velocities inverted for many cells. I checked the velocity genes and they are exactly the same for steady state and stochastic (~400 genes) and they are a subset of the velocity genes that I get with the dynamical model. I reasoned that the problem could reside in the steady state ratio estimate and, actually, looking at the scatter plots of the top 10 dynamics-driving genes (obtained from the dynamical model) in the 3 models (I'm showing one example below), the stochastic model gives a different estimate of the steady state ratio. Is there a way to assess the reliability of the steady state ratio computation (e.g., checking the computation of second order moments for the stochastic model)? Maybe the number of neighbours used in the computation of the moments is critical in this case in which the number of cells is pretty small (~100)? Sorry for the long message and thanks! |
Hello @VolkerBergen, thanks for the cool tool. I also found the RNA velocity in my model system is perfectly inverted in every model (steady, stochastic, and dynamic), and I am sure that the number of cells are high enough, each sample I have has at least 2K cells. In combined, I have over 40K. Below, I show you the u vs s plots of some genes, the blue cells are actually "stem cells/meristematic cells", the yellow and orange ones are "differentiated" cells. Do you have any explanation for why the latent time has to start from the bottom left corner? Why can't it be started at the up right corner? Cause in my case, it seems that the development is driven by degradation of spliced transcripts or repression. Also, do you know how to do the analysis in the case that the user have already the normalized counts value? You just skip the scv.pp.filter_and_normalize() step? Thank you very much! |
Thanks sharing these! Latent time is rooted by the Markov diffusion process that is obtained from the velocity vector field; which for many genes is indeed in the origin; but it is not assumed to always start in there. In your cases it'd be rooted in the yellow clusters because it assigns the yellow-to-blue as induction phase. Here, it's hard for the dynamical model to pick up the right state assignment as there's hardly any curvature it can learn from. Maybe you could try decrease the number of neighbors in Overall it looks exactly like what you've already pointed out. It's mostly governed by degradation. Check out Suppl. Fig. 8b right phase portrait in the manuscript (https://doi.org/10.1101/820936); that nicely resembles the kinetics in your data. I'll see if I can quickly add some user-defined prior to account for predominant degradation. |
|
@jonathan-f Pardon, I've completely forgotten you. From only 100 cells is pretty hard to pick up any signal. However, if others genes display similar patterns like that one, splicing kinetics seems to support directionality from blue-to-orange. Could you check how the stoch. model behaves as you decrease the number of neighbors in |
Hi @VolkerBergen , thank you for a great set of tools! I'd love to join in on the conversation, for I'm facing a similar issue. We differentiated stem cells into definitive endoderm, collecting three time points, (blue = d0, orange = d1, green = d2). Contrary to what we'd expect, the vector field is showing d2 cells primarily moving towards d1 and then d0. In addition, the vector field generated through the dynamical model is quite different from the steady-state approach. The dimensionality after preprocessing was 4555 cells x 2000 genes. **edit |
@VolkerBergen Hi, I am new to python and scVelo. And I have the same issue: all three modes in |
Hi team, thanks for developing this awesome tool. But I also experienced the same inverted direction issue. From my study, I am pretty sure what the correct direction is because I collected time points data from healthy to disease. I tried all the three models steady, stochastic and dynamics but the directions are consistently wrong. I really hope you can fix the issue. Thanks! |
Thanks for the great toolkit. As @VolkerBergen mentioned last year, we could get the top likelihood genes by using the code as below.
However, in the latest version, there has some changes about this.
As you know, this code is written in the tutorials. This comment is just for somebody who read this issue and stack at the step of retrieving the top genes, like me. |
Hi there, |
@MagpiePKU, @yunkaizhang, @WT215 #456 was recently opened. Let's discuss the issue there since it is not related to the original issuer here. |
Hi, I have solved this. Use the Scvelo calculated UMAP and PCA values, and not the ones from coming from Seurat. Like: ... #...after the import of data and the libraries: #And then... B.r., |
This preprint from 2 days ago by Pachterlab mentions the issue of inverted velocities: |
Hello, @paulitikka Thank you for your input on this issue. I have tried to use your code in my analysis. It did not help me rectify my issue with the velocity embedding graph generated with dynamical scvelo mode analysis. It shows the opposite of the stochastic mode of scvelo. I find stochastic mode is better at analyzing RNA velocity than the dynamical, especially in my data. I may be wrong but I am not sure. I am a little bit perplexed to explain this issue to reviewers for publication. please let me know your view on this. Regards, MAD. |
I just found a paper where the authors ran into the issue of inverted pseudotime directionality, identified culprit genes, and removed them in order to resolve the inversion. Hopefully this is relevant/helpful. Paper: Coordinated changes in gene expression kinetics underlie both mouse and human erythroid maturation. |
@Matthew1309, the issue is a conceptual one. As discussed in the paper, the genes exhibiting a transcriptional burst are highly relevant for the overall process. Thus, you cannot simply ignore them. The goal is not to make arrows look good on a UMAP but recover correct Biology, IMO. See also here. |
After using scvelo on our data, I got the reverse direction trajectory compared to the other trajectory methods. Does it make sense to use -1 * (speed vector)?
The text was updated successfully, but these errors were encountered: