Random_state produces different results on different operating systems #153

iicky · 2018-10-15T17:34:06Z

Issue

The random_state parameter produces deterministic results on a specific OS, but does not produce the same results on different OSes. Here are some examples for umap-learn, run with the following code. I used the example from the README here, as well as Scikit's check_random_state as a control (all Scikit results are the same).

The results are also seem to be dependent on the version of Numba that is installed.

# UMAP example with random state
import umap
from sklearn.datasets import load_digits

digits = load_digits()

embedding = umap.UMAP(
  n_neighbors=5,
  min_dist=0.3,
  metric='correlation',
  random_state=2018,
).fit_transform(digits.data)
embedding

# Scikit check random state
from sklearn.utils import check_random_state
random_state = check_random_state(2018)
random_state.rand(4)

Example Results

Machine	Architecture	Python Version	umap-learn Version	numba Version	UMAP Result
Macbook Pro #1	Darwin C02P141DG3QD 16.7.0 Darwin Kernel Version 16.7.0: Thu Jun 21 20:07:39 PDT 2018; root:xnu-3789.73.14~1/RELEASE_X86_64 x86_64	Python 3.7.0	0.3.2	0.39.0	array([[16.42446 , -2.1266642], [ 7.231049 , -1.5276358], [-1.5864906, -5.1226635], ..., [ 6.094945 , 1.2291753], [ 1.3193432, 5.4169164], [ 5.5729628, 2.2857437]], dtype=float32)
Macbook Pro #1	Darwin C02P141DG3QD 16.7.0 Darwin Kernel Version 16.7.0: Thu Jun 21 20:07:39 PDT 2018; root:xnu-3789.73.14~1/RELEASE_X86_64 x86_64	Python 3.7.0	0.3.5	0.40.1	array([[32.471622, 8.842674], [16.400652, 13.036578], [ 9.181449, 3.948576], ..., [19.216055, 12.42009 ], [ 6.522507, 14.285691], [19.517092, 11.733169]], dtype=float32)
Macbook Pro #2	Darwin C02VN4T7HV2L 17.7.0 Darwin Kernel Version 17.7.0: Thu Jun 21 22:53:14 PDT 2018; root:xnu-4570.71.2~1/RELEASE_X86_64 x86_64	Python 3.7.0	0.3.2	0.40.0	array([[16.42446 , -2.1266642], [ 7.231049 , -1.5276358], [-1.5864906, -5.1226635], ..., [ 6.094945 , 1.2291753], [ 1.3193432, 5.4169164], [ 5.5729628, 2.2857437]], dtype=float32)
Macbook Pro #2	Darwin C02VN4T7HV2L 17.7.0 Darwin Kernel Version 17.7.0: Thu Jun 21 22:53:14 PDT 2018; root:xnu-4570.71.2~1/RELEASE_X86_64 x86_64	Python 3.7.0	0.3.5	0.40.1	array([[32.471622, 8.842674], [16.400652, 13.036578], [ 9.181449, 3.948576], ..., [19.216055, 12.42009 ], [ 6.522507, 14.285691], [19.517092, 11.733169]], dtype=float32)
Debian Docker	Linux 389088ec7b25 4.9.93-linuxkit-aufs #1 SMP Wed Jun 6 16:55:56 UTC 2018 x86_64 GNU/Linux	Python 3.5.3	0.3.5	0.40.1	array([[25.864304 , 7.870304 ], [16.924606 , 7.9489594], [ 7.4818945, 9.081071 ], ..., [15.565144 , 10.721824 ], [ 7.7764506, 14.354664 ], [14.85415 , 11.515898 ]], dtype=float32)
Ubuntu Docker	Linux 6a9a07ef70b7 4.9.93-linuxkit-aufs #1 SMP Wed Jun 6 16:55:56 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux	Python 3.6.6	0.3.5	0.40.1	array([[25.864304 , 7.870304 ], [16.924606 , 7.9489594], [ 7.4818945, 9.081071 ], ..., [15.565144 , 10.721824 ], [ 7.7764506, 14.354664 ], [14.85415 , 11.515898 ]], dtype=float32)
Ubuntu Docker	Linux 6a9a07ef70b7 4.9.93-linuxkit-aufs #1 SMP Wed Jun 6 16:55:56 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux	Python 3.7.0	0.3.5	0.40.1	array([[25.864304 , 7.870304 ], [16.924606 , 7.9489594], [ 7.4818945, 9.081071 ], ..., [15.565144 , 10.721824 ], [ 7.7764506, 14.354664 ], [14.85415 , 11.515898 ]], dtype=float32)
Ubuntu Desktop	Linux brick 4.15.0-36-generic #39-Ubuntu SMP Mon Sep 24 16:19:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux	Python 3.6.6	0.3.5	0.40.1	array([[25.864225 , 7.8703256], [16.92632 , 7.943247 ], [ 7.4819674, 9.081023 ], ..., [15.570685 , 10.72381 ], [ 7.776701 , 14.354493 ], [14.864248 , 11.530873 ]], dtype=float32)

The text was updated successfully, but these errors were encountered:

lmcinnes · 2018-10-15T17:57:36Z

Sadly I`m not sure that there is much I can do about this as a certain amount is down to the operating system. I agree that it is a potentially confusing issue, I`m just not sure if I know of any way to address it. Ideas are certainly welcome.

…

On Mon, Oct 15, 2018 at 1:34 PM Mickey Scherrer ***@***.***> wrote: Issue The random_state parameter produces deterministic results on a specific OS, but does not produce the same results on different OSes. Here are some examples for umap-learn, run with the following code. I used the example from the README here, as well as Scikit's check_random_state as a control (all Scikit results are the same). The results are also seem to be dependent on the version of Numba that is installed. # UMAP example with random state import umap from sklearn.datasets import load_digits digits = load_digits() embedding = umap.UMAP( n_neighbors=5, min_dist=0.3, metric='correlation', random_state=2018, ).fit_transform(digits.data) embedding # Scikit check random state from sklearn.utils import check_random_state random_state = check_random_state(2018) random_state.rand(4) Example Results Machine Architecture Python Version umap-learn Version numba Version UMAP Result Macbook Pro #1 <#1> Darwin C02P141DG3QD 16.7.0 Darwin Kernel Version 16.7.0: Thu Jun 21 20:07:39 PDT 2018; root:xnu-3789.73.14~1/RELEASE_X86_64 x86_64 Python 3.7.0 0.3.2 0.39.0 array([[16.42446 , -2.1266642], [ 7.231049 , -1.5276358], [-1.5864906, -5.1226635], ..., [ 6.094945 , 1.2291753], [ 1.3193432, 5.4169164], [ 5.5729628, 2.2857437]], dtype=float32) Macbook Pro #1 <#1> Darwin C02P141DG3QD 16.7.0 Darwin Kernel Version 16.7.0: Thu Jun 21 20:07:39 PDT 2018; root:xnu-3789.73.14~1/RELEASE_X86_64 x86_64 Python 3.7.0 0.3.5 0.40.1 array([[32.471622, 8.842674], [16.400652, 13.036578], [ 9.181449, 3.948576], ..., [19.216055, 12.42009 ], [ 6.522507, 14.285691], [19.517092, 11.733169]], dtype=float32) Macbook Pro #2 <#2> Darwin C02VN4T7HV2L 17.7.0 Darwin Kernel Version 17.7.0: Thu Jun 21 22:53:14 PDT 2018; root:xnu-4570.71.2~1/RELEASE_X86_64 x86_64 Python 3.7.0 0.3.2 0.40.0 array([[16.42446 , -2.1266642], [ 7.231049 , -1.5276358], [-1.5864906, -5.1226635], ..., [ 6.094945 , 1.2291753], [ 1.3193432, 5.4169164], [ 5.5729628, 2.2857437]], dtype=float32) Macbook Pro #2 <#2> Darwin C02VN4T7HV2L 17.7.0 Darwin Kernel Version 17.7.0: Thu Jun 21 22:53:14 PDT 2018; root:xnu-4570.71.2~1/RELEASE_X86_64 x86_64 Python 3.7.0 0.3.5 0.40.1 array([[32.471622, 8.842674], [16.400652, 13.036578], [ 9.181449, 3.948576], ..., [19.216055, 12.42009 ], [ 6.522507, 14.285691], [19.517092, 11.733169]], dtype=float32) Debian Docker Linux 389088ec7b25 4.9.93-linuxkit-aufs #1 <#1> SMP Wed Jun 6 16:55:56 UTC 2018 x86_64 GNU/Linux Python 3.5.3 0.3.5 0.40.1 array([[25.864304 , 7.870304 ], [16.924606 , 7.9489594], [ 7.4818945, 9.081071 ], ..., [15.565144 , 10.721824 ], [ 7.7764506, 14.354664 ], [14.85415 , 11.515898 ]], dtype=float32) Ubuntu Docker Linux 6a9a07ef70b7 4.9.93-linuxkit-aufs #1 <#1> SMP Wed Jun 6 16:55:56 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux Python 3.6.6 0.3.5 0.40.1 array([[25.864304 , 7.870304 ], [16.924606 , 7.9489594], [ 7.4818945, 9.081071 ], ..., [15.565144 , 10.721824 ], [ 7.7764506, 14.354664 ], [14.85415 , 11.515898 ]], dtype=float32) Ubuntu Docker Linux 6a9a07ef70b7 4.9.93-linuxkit-aufs #1 <#1> SMP Wed Jun 6 16:55:56 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux Python 3.7.0 0.3.5 0.40.1 array([[25.864304 , 7.870304 ], [16.924606 , 7.9489594], [ 7.4818945, 9.081071 ], ..., [15.565144 , 10.721824 ], [ 7.7764506, 14.354664 ], [14.85415 , 11.515898 ]], dtype=float32) Ubuntu Desktop Linux brick 4.15.0-36-generic #39 <#39>-Ubuntu SMP Mon Sep 24 16:19:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux Python 3.6.6 0.3.5 0.40.1 array([[25.864225 , 7.8703256], [16.92632 , 7.943247 ], [ 7.4819674, 9.081023 ], ..., [15.570685 , 10.72381 ], [ 7.776701 , 14.354493 ], [14.864248 , 11.530873 ]], dtype=float32) — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#153>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ALaKBQruDB4sXufoAIoudf8JhvzqNhskks5ulMcagaJpZM4XczYt> .

iicky · 2018-10-16T14:51:26Z

I'm not sure what specifically is causing the issue - I get the same results as above if I set a random seed with numpy rather than in the UMAP constructor. Do you think this is an issue with Numba?

kurtforrester · 2019-03-13T11:20:58Z

I can confirm the same phenomenon with Linux versus Widows. The results between each architecture are very similar (visually) but different enough to produce significantly varying results when performing clustering with HDBSCAN (employing identical settings and package versions including numba, scipy, scikit-learn, and numpy). When HDBSCAN, on either architecture, is fed the same reduced umap dataset (either from Linux >> Windows or Windows >> Linux) the resulting clusters produced are the same. That is HDBSCAN is consistent across platforms whereas UMAP is not.

Windows output (UMAP + HDBSCAN):
34 clusters with 918 noise points (2836 total points)

Linux output (UMAP + HDBSCAN):
2 clusters with 0 noise points (2836 total points)

huidongchen · 2020-06-01T23:40:10Z

I am running into the same issue (MacOS vs Linux).

lmcinnes · 2020-06-02T05:17:18Z

I don't believe this is anything I can fix at all easily -- it comes down to lower level libraries like numpy which I rely on. Sorry.

…

On Mon, Jun 1, 2020 at 7:40 PM Huidong Chen ***@***.***> wrote: I am running into the same issue (MacOS vs Linux). — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#153 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AC3IUBJQA3XRKLYAEHLQJT3RUQ36NANCNFSM4F3TGYWQ> .

simonwm · 2021-07-25T21:27:40Z

I have the same issue (Mac/Windows/WSL/Linux) - and maybe an idea how to solve it.

I could solve reproducibility issues in other libraries by seeding everything which is seedable from the outside in addition to supplying the random seed for the package itself: random.seed, numpy.random.seed. Here however this did not work.

There are two additional sources of randomness which I can think of and which are not (easily) fixable from the outside: instantiated numpy random generators (but I really think @lmcinnes took care of that if necessary) and the numba random number generators.

While they look identical to the top level numpy ones and are also seeded by numpy.random.seed just from within numba code, they are independent from the non-numba numpy random generators, and they are initialized at startup with entropy drawn from the operating system. https://numba.pydata.org/numba-doc/latest/reference/pysupported.html?highlight=numpy%20random#random

If that is really the reason, the fix is simple in principle: Just call numpy.random.seed also from within your numba-jitted code.

And if numba randomness is the only reason for not having the same result for serial and parallel runs, you might be able to figure out a scheme to specify a deterministic seed for every block of work - and use the same seeds independent of the number of threads, in particular also in the serial run.

samarthsarin · 2022-10-07T07:43:28Z

Hey everyone!
Any update on this issue of different results on different OS machines? I am also facing this issue of different results on Windows and Linux. I have tried setting up all the seeds available in different libraries but still results don't match across different machines.

pkstys · 2023-02-21T01:39:05Z

I posted my similar findings with example output here: MaartenGr/BERTopic#559

pavlin-policar mentioned this issue Dec 14, 2018

Non-deterministic results with random_state #183

Closed

keller-mark mentioned this issue Aug 29, 2020

Try UMAP instead of PCA vitessce/vitessce-data#112

Draft

OmriPi mentioned this issue Jun 21, 2022

Inconsistent results on different machines MaartenGr/BERTopic#559

Closed

YannCabanes mentioned this issue Oct 9, 2022

Fix LearningShapelets continuous integration test error on Linux tslearn-team/tslearn#427

Merged

saragrau mentioned this issue Apr 25, 2023

Fix reproducibility issues in UMAP and in t-SNE cschlaffner/PROTzilla2#120

Open

miguelangel43 mentioned this issue Mar 11, 2024

Split dataset into train test and save it miguelangel43/Dimensionality-Reduction-Masters-Thesis#21

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Random_state produces different results on different operating systems #153

Random_state produces different results on different operating systems #153

iicky commented Oct 15, 2018

lmcinnes commented Oct 15, 2018 via email

iicky commented Oct 16, 2018

kurtforrester commented Mar 13, 2019

huidongchen commented Jun 1, 2020

lmcinnes commented Jun 2, 2020 via email

simonwm commented Jul 25, 2021

samarthsarin commented Oct 7, 2022 •

edited

Loading

pkstys commented Feb 21, 2023

Random_state produces different results on different operating systems #153

Random_state produces different results on different operating systems #153

Comments

iicky commented Oct 15, 2018

Issue

Example Results

lmcinnes commented Oct 15, 2018 via email

iicky commented Oct 16, 2018

kurtforrester commented Mar 13, 2019

huidongchen commented Jun 1, 2020

lmcinnes commented Jun 2, 2020 via email

simonwm commented Jul 25, 2021

samarthsarin commented Oct 7, 2022 • edited Loading

pkstys commented Feb 21, 2023

samarthsarin commented Oct 7, 2022 •

edited

Loading