leiden and umap not reproducible on different CPUs #2014

grst · 2021-10-15T08:07:27Z

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of scanpy.
(optional) I have confirmed this bug exists on the master branch of scanpy.

I noticed that running the same single-cell analyses on different nodes of our HPC produces different results.
Starting from the same anndata object with a precomputed X_scVI latent representation, the UMAP and leiden-clustering looks different.

On

Intel(R) Xeon(R) CPU E5-2699A v4 @ 2.40GHz
AMD EPYC 7352 24-Core Processor
Intel(R) Xeon(R) CPU E7-4850 v4 @ 2.10GHz

adata.obs["leiden"].value_counts()

On

Intel(R) Xeon(R) CPU E7- 4870 @ 2.40GHz

Minimal code sample (that we can copy&paste without having any data)

A git repository with example data, notebook and a nextflow pipeline is available here:
https://github.com/grst/scanpy_reproducibility

A report of the analysis executed on four different CPU architectures is available here:
https://grst.github.io/scanpy_reproducibility/

Versions

WARNING: If you miss a compact list, please try `print_header`!
-----
anndata     0.7.5
scanpy      1.6.0
sinfo       0.3.1
-----
PIL                 8.0.1
anndata             0.7.5
backcall            0.2.0
cairo               1.20.0
cffi                1.14.4
colorama            0.4.4
cycler              0.10.0
cython_runtime      NA
dateutil            2.8.1
decorator           4.4.2
get_version         2.1
h5py                3.1.0
igraph              0.8.3
ipykernel           5.3.4
ipython_genutils    0.2.0
jedi                0.17.2
joblib              0.17.0
kiwisolver          1.3.1
legacy_api_wrap     0.0.0
leidenalg           0.8.3
llvmlite            0.35.0
matplotlib          3.3.3
mpl_toolkits        NA
natsort             7.1.0
numba               0.52.0
numexpr             2.7.1
numpy               1.19.4
packaging           20.7
pandas              1.1.4
parso               0.7.1
pexpect             4.8.0
pickleshare         0.7.5
pkg_resources       NA
prompt_toolkit      3.0.8
ptyprocess          0.6.0
pycparser           2.20
pygments            2.7.2
pyparsing           2.4.7
pytz                2020.4
scanpy              1.6.0
scipy               1.5.3
setuptools_scm      NA
sinfo               0.3.1
six                 1.15.0
sklearn             0.23.2
sphinxcontrib       NA
storemagic          NA
tables              3.6.1
texttable           1.6.3
tornado             6.1
traitlets           5.0.5
umap                0.4.6
wcwidth             0.2.5
yaml                5.3.1
zmq                 20.0.0
-----
IPython             7.19.0
jupyter_client      6.1.7
jupyter_core        4.7.0
-----
Python 3.8.6 | packaged by conda-forge | (default, Nov 27 2020, 19:31:52) [GCC 9.3.0]
Linux-3.10.0-1160.11.1.el7.x86_64-x86_64-with-glibc2.10
64 logical CPU cores, x86_64
-----
Session information updated at 2021-10-15 09:58

The text was updated successfully, but these errors were encountered:

Zethson · 2021-10-15T20:26:37Z

Hey,

please report back with containerized environments as discussed on Twitter.

grst · 2021-10-16T06:46:12Z

Hi,

I updated the pipeline to use this singularity container. The problem persists.

grst · 2021-10-18T06:23:13Z

I tried also with NUMBA_DISABLE_JIT=1. The leiden clustering is stable now (at least across the four nodes), yet the UMAP again looks different to both the previous versions and between the two groups of CPUs:

On

Intel(R) Xeon(R) CPU E5-2699A v4 @ 2.40GHz
AMD EPYC 7352 24-Core Processor
Intel(R) Xeon(R) CPU E7-4850 v4 @ 2.10GHz

On

Intel(R) Xeon(R) CPU E7- 4870 @ 2.40GHz

ivirshup · 2021-10-18T13:32:42Z

Do you have any idea if the issue is with our implementation of these, or with the underlying libraries? E.g. if you just run UMAP directly from the UMAP library can you get exact reproducibility? I don't think we would be able to promise more reproducibility than what they offer, and irreproducibility across machines is a known issue there: lmcinnes/umap#525.

grst · 2021-10-19T09:40:06Z

That's a good point, and it is not:

reducer = umap.UMAP(min_dist=0.5)
embedding = reducer.fit_transform(adata.obsm["X_scVI"])
adata.obsm["X_umap"] = embedding

again produces stable results on only 3/4 CPUs.

Ok, let's forget about UMAP. It's only a nice figure to get an overview of the data and I don't use it for downstream stuff. Irreproducible clustering, on the other hand, is quite a deal-breaker, as for instance cell-type annotations depend on it. I mean, why would I even bother releasing the source code of an analysis alongside the paper if it is not reproducible anyway?

I found out a few more things:

the leiden algorithm itself seems deterministic on all 4 nodes, when started from a pre-computed adata.obsp["connectivities"].
when running pp.neighbors with NUMBA_DISABLE_JIT=1, the clustering is stable on all four nodes (but terribly slow, ofc)
when rounding the connectivities to 3-4 digits, the clustering is also stable (plus the total runtime is reduced from 2:30 to 1:50min)

adata.obsp["connectivities"] = np.round(adata.obsp["connectivities"], decimals=3)
adata.obsp["distances"] = np.round(adata.obsp["distances"], decimals=3)

Zethson · 2021-10-19T09:41:52Z

So should we add a deterministic flag for Leiden clustering which enforces NUMBA_DISABLE_JIT=1 and is disabled by default? Users then have the option to enable this for publication code.

chlee-tabin · 2021-10-19T12:20:29Z

I have yet to install the most latest scanpy and I do not have CPUs to test for this specific case, but I had some issue of reproducing leiden results from scanpy from a published paper, and found that running leiden 10 times (per n_iteration option) resolved any discrepancy. Wonder whether it adds to the discussion.

ivirshup · 2021-10-19T12:37:48Z

@grst I don't think leiden is the issue here, but pynndescent.

My guess is this is going to have to do with the CPU that gives different results being much older using a different instruction set that the other intel processors. This could be triggered by either use of any parallelism at all or pynndescent being pretty liberal with the use of numba's fastmath, and different CPUs having different features.

Do you get the same graph out of pynndescent if you are make the computation single threaded? If not, we may be able to look at the assembly to see which feature sets give different results. It's possible these can be turned off with a flag. But we may then have to consider what kind of a performance hit we'd be willing to take for exact reproducibility on discontinued chip sets.

grst · 2021-10-19T12:44:12Z

I don't think leiden is the issue here, but pynndescent.

💯

Do you get the same graph out of pynndescent if you are make the computation single threaded

I have already been exporting those four env vars before running the analysis. Is there anything else that might be threaded?

export MKL_NUM_THREADS="1"
export OPENBLAS_NUM_THREADS="1"
export OMP_NUM_THREADS="1"
export NUMBA_NUM_THREADS="1"

ivirshup · 2021-10-19T12:47:00Z

threadpoolctl does a nice job of accounting for most possible options. But do you see more than a single thread being used?

grst · 2021-10-19T12:57:25Z

Just to be sure, I additionally included

from threadpoolctl import threadpool_limits
threadpool_limits(1)

I also confirmed that only a single thread was actually used. The results didn't change.

ivirshup · 2021-10-19T13:17:15Z

Alright, if you would like to achieve reproducibility the next things to play around with would probably be these CPU feature flag variables for numba. In particular: NUMBA_CPU_NAME=generic.

grst · 2021-10-19T14:01:00Z

another 💯

with NUMBA_CPU_NAME=generic the leiden results are the same on all four nodes and also the UMAPs look a lot more similar (but not identical).

In terms of speed I didn't notice a bit difference (maybe 10s slower, but that would require proper benchmarking ofc).

ivirshup · 2021-10-19T15:00:12Z

What do you think the right thing to do here is?

I don't think we can modify the global environment for our users by setting those numba flags by default. Is this just something we should document?

grst · 2021-10-19T18:16:20Z

How about creating a page about reproducibility in the docs, similar to the one by pytorch? It could gather all information around reproducibility with scanpy, such as

use of containers
numba flags

and also state the limits, e.g.

UMAP is not reproducible across systems.

In addition to that, @Zethson, what do you think of creating a mlf-core template for single-cell analyses that sets the right defaults?

Zethson · 2021-10-20T07:52:25Z

In addition to that, @Zethson, what do you think of creating a mlf-core template for single-cell analyses that sets the right defaults?

Mhm, certainly a cool option, but nothing that I could tackle in the next weeks due to time constraints. I would start here with a reproducibility section in the documentation and maybe a "deterministic" switch in the Scanpy settings which sets all required numba flags.

ivirshup · 2024-04-10T12:21:12Z

How about creating a page about reproducibility in the docs, similar to the one by pytorch? It could gather all information around reproducibility with scanpy, such as

use of containers

numba flags

@grst, this would be great. Would you be up for writing this at some point?

I'm thinking it could either go:

As a how-to page if it's example driven
A "User Guide" page, where we'd start a "User Guide" that would contain this and "Usage principles"

grst · 2024-04-10T12:25:56Z

Yeah, not sure when I can make it, though.

Zethson · 2024-04-10T12:27:37Z

We could then also consider extending it with rapids single cell @Intron7 . More of a scverse reproducibility page and maybe also bring in scvi tools

Intron7 · 2024-04-10T13:06:24Z

@Zethson UMAP is a nightmare when it comes to cuml. It's not reproducible on the same GPU even if you set a seed. However I think Leiden might be a candidate with a brute-knn. But I have to check this.

Zethson · 2024-04-10T13:08:15Z

I assume cuml is based on atomic operations which are non-deterministic. Just documenting the behavior would be good enough for now @Intron7 . I would suggest that this is something that we pick up when your status changes @Intron7 and then we can team up as three?

ivirshup added Area - Documentation 📒 Area – Reproducibility Exact replication labels Oct 26, 2021

ivirshup added this to the 1.9.0 milestone Oct 26, 2021

ivirshup modified the milestones: 1.9.0, 1.10.0 Mar 21, 2022

ivirshup modified the milestones: 1.10.0, 1.11.0 Feb 20, 2024

grst mentioned this issue Mar 26, 2024

tl.leiden suddenly produces different results #2956

Closed

3 tasks

ivirshup added the Enhancement ✨ label Apr 10, 2024

grst mentioned this issue Aug 8, 2024

Update scRNA tutorial scverse/scverse-tutorials#159

Merged

flying-sheep removed the Enhancement ✨ label Dec 16, 2024

flying-sheep modified the milestones: 1.11.0, 1.12.0 Dec 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

leiden and umap not reproducible on different CPUs #2014

leiden and umap not reproducible on different CPUs #2014

grst commented Oct 15, 2021

Zethson commented Oct 15, 2021

grst commented Oct 16, 2021

grst commented Oct 18, 2021

ivirshup commented Oct 18, 2021 •

edited

Loading

grst commented Oct 19, 2021

Zethson commented Oct 19, 2021

chlee-tabin commented Oct 19, 2021

ivirshup commented Oct 19, 2021

grst commented Oct 19, 2021 •

edited by flying-sheep

Loading

ivirshup commented Oct 19, 2021

grst commented Oct 19, 2021

ivirshup commented Oct 19, 2021

grst commented Oct 19, 2021

ivirshup commented Oct 19, 2021

grst commented Oct 19, 2021 •

edited

Loading

Zethson commented Oct 20, 2021

ivirshup commented Apr 10, 2024

grst commented Apr 10, 2024

Zethson commented Apr 10, 2024 •

edited

Loading

Intron7 commented Apr 10, 2024

Zethson commented Apr 10, 2024

leiden and umap not reproducible on different CPUs #2014

leiden and umap not reproducible on different CPUs #2014

Comments

grst commented Oct 15, 2021

Minimal code sample (that we can copy&paste without having any data)

Versions

Zethson commented Oct 15, 2021

grst commented Oct 16, 2021

grst commented Oct 18, 2021

ivirshup commented Oct 18, 2021 • edited Loading

grst commented Oct 19, 2021

Zethson commented Oct 19, 2021

chlee-tabin commented Oct 19, 2021

ivirshup commented Oct 19, 2021

grst commented Oct 19, 2021 • edited by flying-sheep Loading

ivirshup commented Oct 19, 2021

grst commented Oct 19, 2021

ivirshup commented Oct 19, 2021

grst commented Oct 19, 2021

ivirshup commented Oct 19, 2021

grst commented Oct 19, 2021 • edited Loading

Zethson commented Oct 20, 2021

ivirshup commented Apr 10, 2024

grst commented Apr 10, 2024

Zethson commented Apr 10, 2024 • edited Loading

Intron7 commented Apr 10, 2024

Zethson commented Apr 10, 2024

ivirshup commented Oct 18, 2021 •

edited

Loading

grst commented Oct 19, 2021 •

edited by flying-sheep

Loading

grst commented Oct 19, 2021 •

edited

Loading

Zethson commented Apr 10, 2024 •

edited

Loading