You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
de.test.versus_rest() returns the wrong comparison for GLM tests (I have tested it with test=wald).
This is due to the comparison being made on the wrong coefficient, i.e. the test is made using ['Intercept', 'grouping[T.rest]'], resulting in .summary_group(group=group) returning the comparison rest vs. group, and not group vs. rest as expected.
The results appear to be correct for other tests, e.g. the Wilcoxon rank sum test.
See below for a detailed description.
Expected behavior
To get the comparison group vs. rest, call .summary_group(group=group) on the output of de.test.versus_rest.
To Reproduce
I am using 10x_pbmc68k_reduced.h5ad from scanpy datasets.
import anndata
import scanpy as sc
import diffxpy.api as de
adata = sc.read_h5ad('10x_pbmc68k_reduced.h5ad')
data = anndata.AnnData(
X=adata.raw.to_adata().X.todense(),
var=adata.var,
obs=adata.obs
)
test_wald = de.test.versus_rest(
data=data.X,
grouping="bulk_labels",
test="wald",
noise_model="nb",
gene_names=data.var_names,
sample_description=data.obs
)
This results in drastically different results than
# pick one cell type
test_rank.summary_group(group='Dendritic').sort_values(by=['qval', 'log2fc'], ascending=[True, False])
# this compares to
sc.tl.rank_genes_groups(data, 'bulk_labels', method='wilcoxon')
sc.pl.rank_genes_groups(data, groups=['Dendritic'])
I can reproduce these results by doing the calls by hand, e.g. use Dendritic cells, and do the de.test.two_sample for wald and rank using test_grouping = np.where(grouping == 'Dendritic', "group", "rest"). I can also reproduce the results by actually doing the de.test.waldvs.de.test.rank_test.
With de.test.wald, we are in fact testing grouping[T.rest] vs. Intercept, and NOT something like grouping[T.group] vs. Intercept/rest. So when calling e.g..summary_group(group='Dendritic'), we are not getting the comparison we expect to have (namely Dendritic vs. rest).
For de.test.rank_test, in DifferentialExpressionTestRank, this line
x0, x1 = split_x(data, grouping)
# this returns groups in that order groups = pd.Series(grouping).unique()
returns groups in the right order, namely x0=rest, and x1=group.
OS/version:
Linux atlas 4.19.0-17-amd64 SMP Debian 4.19.194-3 (2021-07-18) x86_64 GNU/Linux (ran on JupyterHub numpy backend)
Python 3.9.5
diffxpy v0.7.4+21.g12f1286 (install from git)
scanpy 1.7.2
anndata 0.7.8
numpy 1.20.3
scipy 1.7.3
Other questions
When I look at the fold changes from .summary() of de.test.wald and de.test.rank_test, they are not the same
as those I get from the .summary_group() of de.test.versus_rest. This seem to be due to .log_fold_change()vs. .log2_fold_change()... but I'm not quite sure what's happening here. That would be great if you could clarify this.
Thanks for looking into this.
The text was updated successfully, but these errors were encountered:
Describe the bug
de.test.versus_rest()
returns the wrong comparison for GLM tests (I have tested it withtest=wald
).This is due to the comparison being made on the wrong coefficient, i.e. the test is made using ['Intercept', 'grouping[T.rest]'], resulting in
.summary_group(group=group)
returning the comparison rest vs. group, and not group vs. rest as expected.The results appear to be correct for other tests, e.g. the Wilcoxon rank sum test.
See below for a detailed description.
Expected behavior
To get the comparison group vs. rest, call
.summary_group(group=group)
on the output ofde.test.versus_rest
.To Reproduce
I am using 10x_pbmc68k_reduced.h5ad from
scanpy
datasets.This results in drastically different results than
The second are consistent with e.g https://scanpy-tutorials.readthedocs.io/en/latest/pbmc3k.html
I can reproduce these results by doing the calls by hand, e.g. use Dendritic cells, and do the
de.test.two_sample
for wald and rank usingtest_grouping = np.where(grouping == 'Dendritic', "group", "rest")
. I can also reproduce the results by actually doing thede.test.wald
vs.de.test.rank_test
.With
de.test.wald
, we are in fact testing grouping[T.rest] vs. Intercept, and NOT something like grouping[T.group] vs. Intercept/rest. So when calling e.g..summary_group(group='Dendritic')
, we are not getting the comparison we expect to have (namely Dendritic vs. rest).For
de.test.rank_test
, inDifferentialExpressionTestRank
, this linereturns groups in the right order, namely x0=rest, and x1=group.
OS/version:
Linux atlas 4.19.0-17-amd64 SMP Debian 4.19.194-3 (2021-07-18) x86_64 GNU/Linux (ran on JupyterHub numpy backend)
Python 3.9.5
diffxpy v0.7.4+21.g12f1286 (install from git)
scanpy 1.7.2
anndata 0.7.8
numpy 1.20.3
scipy 1.7.3
Other questions
When I look at the fold changes from
.summary()
ofde.test.wald
andde.test.rank_test
, they are not the sameas those I get from the
.summary_group()
ofde.test.versus_rest
. This seem to be due to.log_fold_change()
vs..log2_fold_change()
... but I'm not quite sure what's happening here. That would be great if you could clarify this.Thanks for looking into this.
The text was updated successfully, but these errors were encountered: