Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Genarate the same random ndarray when size of random ndarray is changed #165

Open
neko-suki opened this issue Feb 1, 2019 · 6 comments
Open
Assignees
Labels
bug Something isn't working

Comments

@neko-suki
Copy link
Collaborator

  • Following simple program shows that when I put different shape to clpy.random.rand(), the first few elements of result in ndarray is the same.
import clpy
import time

tmp = clpy.random.rand(5)
print(tmp)

tmp =clpy.random.rand(2)
print(tmp)
  • Following is the result. You can see that the first 2 element is the same.
$ python sample_random4.py 
[0.44014896 0.8914092  0.34266944 0.79392968 0.24518993]
[0.44014896 0.8914092 ]

The reason is to use the same seed self.seed_value when the size of random number array is changed.

https://github.com/fixstars/clpy/blob/clpy/clpy/random/generator.py#L178

@neko-suki neko-suki added the bug Something isn't working label Feb 1, 2019
@neko-suki
Copy link
Collaborator Author

I show another example since the example code in #165 (comment) is resolved by modification for #166.

  • Code
import clpy
import time
tmp = clpy.random.rand(5)
print(tmp)
tmp =clpy.random.rand(6)
print(tmp)
  • Result on the Secondary Machine
[0.34526245 0.79652269 0.24778294 0.69904318 0.15030342]
[0.34526245 0.79652269 0.24778294 0.69904318 0.15030342 0.60156366]

@neko-suki neko-suki self-assigned this Feb 5, 2019
@neko-suki
Copy link
Collaborator Author

I tried to reuse the first element of self.seed_array[0]. It can solve the problem.

The result of code in #165 (comment) is as follows.

$ python new_random.py 
[0.90678602 0.35804626 0.8093065  0.26056675 0.71182699]
[0.74465217 0.19591241 0.64717265 0.0984329  0.54969314 0.00095338]
  • Diff is as follows.
$ git diff
diff --git a/clpy/random/generator.py b/clpy/random/generator.py
index 5430281..c6e40fa 100644
--- a/clpy/random/generator.py
+++ b/clpy/random/generator.py
@@ -176,9 +176,13 @@ class RandomState(object):
 
         if (not isinstance(self.seed_array, clpy.ndarray)
                 or self.seed_array.size < array_size):
+            if (not isinstance(self.seed_array, clpy.ndarray)):
+                initial_seed = self.seed_value
+            else:
+                initial_seed = self.seed_array[0]
             self.seed_array = clpy.empty(size, "uint")
             tmp_seed_array = clpy.empty(size, "uint")
-            tmp_seed_array.fill(self.seed_value)
+            tmp_seed_array.fill(initial_seed)
             RandomState._init_kernel(tmp_seed_array, self.seed_array)
             # not to use similar number for the first generation
             RandomState._lcg_kernel(self.seed_array, out)

@LWisteria
Copy link
Member

LWisteria commented Feb 5, 2019

How about adding influctuation to generate rand number (e.g. time) when rand() called?

@neko-suki
Copy link
Collaborator Author

@LWisteria From the performance point of view, it takes much longer time only the first time the program pass through initial_seed = self.seed_array[0]. I don't know the reason why it happens.

It can be reproduced every time and it happens on both of the Primary Machine and the Secondary Machine.

  • code
import clpy
import time

base = 100000
#initial call
beg = time.time()
clpy.random.rand(base) 
end = time.time()
print("time = {:.5f} msec".format(end - beg))

for i in range(20):
    beg = time.time()
    clpy.random.rand(base + 1 + i)
    end = time.time()
    print("time = {:.5f} msec".format(end - beg))
  • Result on the Primary Machine (Vega)
time = 0.13137 msec
time = 0.04378 msec
time = 0.00024 msec
time = 0.00011 msec
time = 0.00010 msec
time = 0.00010 msec
time = 0.00011 msec
time = 0.00010 msec
time = 0.00010 msec
time = 0.00010 msec
time = 0.00009 msec
time = 0.00011 msec
time = 0.00010 msec
time = 0.00010 msec
time = 0.00011 msec
time = 0.00010 msec
time = 0.00010 msec
time = 0.00011 msec
time = 0.00011 msec
time = 0.00075 msec
time = 0.00011 msec
  • Result on the Secondary Machine (titanv)
$ python new_random4.py 
time = 0.12433 msec
time = 0.04277 msec
time = 0.00138 msec
time = 0.00150 msec
time = 0.00104 msec
time = 0.00101 msec
time = 0.00093 msec
time = 0.00089 msec
time = 0.00088 msec
time = 0.00088 msec
time = 0.00089 msec
time = 0.00089 msec
time = 0.00089 msec
time = 0.00090 msec
time = 0.00090 msec
time = 0.00088 msec
time = 0.00089 msec
time = 0.00094 msec
time = 0.00063 msec
time = 0.00115 msec
time = 0.00064 msec

@LWisteria
Copy link
Member

it takes much longer time only the first time the program

I think this is not problem on random generator. It happenes the first time to execute every cupy/clpy kernel because it needs to initialize something in the runtime lib.

Anyway, doesn't that comment refer to my comment?

How about adding influctuation to generate rand number (e.g. time) when rand() called?

I mean, this issue (not performance, it was already solved on #162 ) could be solved if you pass time value to kernel argument and add it and each thread's seeds.

@neko-suki
Copy link
Collaborator Author

I'm sorry I misread your comments.

I mean, this issue (not performance, it was already solved on #162 ) could be solved if you pass time value to kernel argument and add it and each thread's seeds.

To get and use time value each time for initial seed is good idea. I'll do it. Thank you for your comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants