-
Notifications
You must be signed in to change notification settings - Fork 812
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Random_state produces different results on different operating systems #153
Comments
Sadly I`m not sure that there is much I can do about this as a certain
amount is down to the operating system. I agree that it is a potentially
confusing issue, I`m just not sure if I know of any way to address it.
Ideas are certainly welcome.
…On Mon, Oct 15, 2018 at 1:34 PM Mickey Scherrer ***@***.***> wrote:
Issue
The random_state parameter produces deterministic results on a specific
OS, but does not produce the same results on different OSes. Here are some
examples for umap-learn, run with the following code. I used the example
from the README here, as well as Scikit's check_random_state as a control
(all Scikit results are the same).
The results are also seem to be dependent on the version of Numba that is
installed.
# UMAP example with random state
import umap
from sklearn.datasets import load_digits
digits = load_digits()
embedding = umap.UMAP(
n_neighbors=5,
min_dist=0.3,
metric='correlation',
random_state=2018,
).fit_transform(digits.data)
embedding
# Scikit check random state
from sklearn.utils import check_random_state
random_state = check_random_state(2018)
random_state.rand(4)
Example Results
Machine Architecture Python Version umap-learn Version numba Version UMAP
Result
Macbook Pro #1 <#1> Darwin
C02P141DG3QD 16.7.0 Darwin Kernel Version 16.7.0: Thu Jun 21 20:07:39 PDT
2018; root:xnu-3789.73.14~1/RELEASE_X86_64 x86_64 Python 3.7.0 0.3.2
0.39.0 array([[16.42446 , -2.1266642], [ 7.231049 , -1.5276358],
[-1.5864906, -5.1226635], ..., [ 6.094945 , 1.2291753],
[ 1.3193432, 5.4169164], [ 5.5729628, 2.2857437]],
dtype=float32)
Macbook Pro #1 <#1> Darwin
C02P141DG3QD 16.7.0 Darwin Kernel Version 16.7.0: Thu Jun 21 20:07:39 PDT
2018; root:xnu-3789.73.14~1/RELEASE_X86_64 x86_64 Python 3.7.0 0.3.5
0.40.1 array([[32.471622, 8.842674], [16.400652, 13.036578],
[ 9.181449, 3.948576], ..., [19.216055, 12.42009 ], [
6.522507, 14.285691], [19.517092, 11.733169]], dtype=float32)
Macbook Pro #2 <#2> Darwin
C02VN4T7HV2L 17.7.0 Darwin Kernel Version 17.7.0: Thu Jun 21 22:53:14 PDT
2018; root:xnu-4570.71.2~1/RELEASE_X86_64 x86_64 Python 3.7.0 0.3.2 0.40.0 array([[16.42446
, -2.1266642], [ 7.231049 , -1.5276358], [-1.5864906,
-5.1226635], ..., [ 6.094945 , 1.2291753], [
1.3193432, 5.4169164], [ 5.5729628, 2.2857437]], dtype=float32)
Macbook Pro #2 <#2> Darwin
C02VN4T7HV2L 17.7.0 Darwin Kernel Version 17.7.0: Thu Jun 21 22:53:14 PDT
2018; root:xnu-4570.71.2~1/RELEASE_X86_64 x86_64 Python 3.7.0 0.3.5 0.40.1 array([[32.471622,
8.842674], [16.400652, 13.036578], [ 9.181449, 3.948576],
..., [19.216055, 12.42009 ], [ 6.522507, 14.285691],
[19.517092, 11.733169]], dtype=float32)
Debian Docker Linux 389088ec7b25 4.9.93-linuxkit-aufs #1
<#1> SMP Wed Jun 6 16:55:56 UTC
2018 x86_64 GNU/Linux Python 3.5.3 0.3.5 0.40.1 array([[25.864304 ,
7.870304 ], [16.924606 , 7.9489594], [ 7.4818945, 9.081071
], ..., [15.565144 , 10.721824 ], [ 7.7764506,
14.354664 ], [14.85415 , 11.515898 ]], dtype=float32)
Ubuntu Docker Linux 6a9a07ef70b7 4.9.93-linuxkit-aufs #1
<#1> SMP Wed Jun 6 16:55:56 UTC
2018 x86_64 x86_64 x86_64 GNU/Linux Python 3.6.6 0.3.5 0.40.1 array([[25.864304
, 7.870304 ], [16.924606 , 7.9489594], [ 7.4818945,
9.081071 ], ..., [15.565144 , 10.721824 ], [
7.7764506, 14.354664 ], [14.85415 , 11.515898 ]], dtype=float32)
Ubuntu Docker Linux 6a9a07ef70b7 4.9.93-linuxkit-aufs #1
<#1> SMP Wed Jun 6 16:55:56 UTC
2018 x86_64 x86_64 x86_64 GNU/Linux Python 3.7.0 0.3.5 0.40.1 array([[25.864304
, 7.870304 ], [16.924606 , 7.9489594], [ 7.4818945,
9.081071 ], ..., [15.565144 , 10.721824 ], [
7.7764506, 14.354664 ], [14.85415 , 11.515898 ]], dtype=float32)
Ubuntu Desktop Linux brick 4.15.0-36-generic #39
<#39>-Ubuntu SMP Mon Sep 24
16:19:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux Python 3.6.6 0.3.5 0.40.1 array([[25.864225
, 7.8703256], [16.92632 , 7.943247 ], [ 7.4819674,
9.081023 ], ..., [15.570685 , 10.72381 ], [ 7.776701
, 14.354493 ], [14.864248 , 11.530873 ]], dtype=float32)
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#153>, or mute the thread
<https://github.com/notifications/unsubscribe-auth/ALaKBQruDB4sXufoAIoudf8JhvzqNhskks5ulMcagaJpZM4XczYt>
.
|
I'm not sure what specifically is causing the issue - I get the same results as above if I set a random seed with numpy rather than in the UMAP constructor. Do you think this is an issue with Numba? |
I am running into the same issue (MacOS vs Linux). |
I don't believe this is anything I can fix at all easily -- it comes down
to lower level libraries like numpy which I rely on. Sorry.
…On Mon, Jun 1, 2020 at 7:40 PM Huidong Chen ***@***.***> wrote:
I am running into the same issue (MacOS vs Linux).
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#153 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AC3IUBJQA3XRKLYAEHLQJT3RUQ36NANCNFSM4F3TGYWQ>
.
|
I have the same issue (Mac/Windows/WSL/Linux) - and maybe an idea how to solve it. I could solve reproducibility issues in other libraries by seeding everything which is seedable from the outside in addition to supplying the random seed for the package itself: There are two additional sources of randomness which I can think of and which are not (easily) fixable from the outside: instantiated numpy random generators (but I really think @lmcinnes took care of that if necessary) and the numba random number generators. While they look identical to the top level numpy ones and are also seeded by If that is really the reason, the fix is simple in principle: Just call And if numba randomness is the only reason for not having the same result for serial and parallel runs, you might be able to figure out a scheme to specify a deterministic seed for every block of work - and use the same seeds independent of the number of threads, in particular also in the serial run. |
Hey everyone! |
I posted my similar findings with example output here: MaartenGr/BERTopic#559 |
Issue
The
random_state
parameter produces deterministic results on a specific OS, but does not produce the same results on different OSes. Here are some examples for umap-learn, run with the following code. I used the example from the README here, as well as Scikit's check_random_state as a control (all Scikit results are the same).The results are also seem to be dependent on the version of Numba that is installed.
Example Results
The text was updated successfully, but these errors were encountered: