-
Notifications
You must be signed in to change notification settings - Fork 403
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
xgenerator for random arrays #76
Conversation
356bdc4
to
d15637d
Compare
return m_dist(m_engine); | ||
} | ||
} | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you detail what this logic does.
I am confused by why you would need strides.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, so my basic thinking was that a random number generator puts out a "fixed" sequence of random numbers.
And then one could go reset and advance the RNG as one pleases, basically the same as having linearly laid out data in a vector. That's what I was using the strides for, to find the correct "advance" position of the RNG.
And that part is working fine.
Unfortunately, distributions can also carry an internal state (that can be resetted as well). But, afaik, distributions don't offer an advance
operation. So the ()
operator would have to be called advance
times in order to arrive at the correct random number.
This will make random access that doesn't linearly count upwards kinda slow, as there will be a bunch of reset/advance ... cycles.
Is there a better option?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for clarifying. I have not thought about it yet. I will be AFK most of the afternoon, but will have a more thorough look later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am wondering if we are not overthinking this. If instead, the operator()
always generated a new value, we would not have to pay the cost of discarding values.
Then, we would overload operator ==
so that random expressions are always different to any other expressions except those of the same type with the same internal state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm... good point.
Maybe you're right, and if random values should be "fixed" then it just has to be evaluated and stored in some xarray or something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like @JohanMabille to chime in before we make a decision.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should keep things simple. We hardly need to reproduce the same sequence of random numbers; and when that's the case, as far as I know, these numbers are generally precomputed and stored in files (so you can reproduce a scenario / use case many days after).
That would also significantly improve performance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm just wondering if people might think it's strange if the same xfunction evaluates to different results.
Like if you have something like:
auto a = 2 + randn({3, 5}, 0, 2);
double b = a(0, 0) - a(0, 0); // != 0 ...
but I guess it's a matter of documenting it correctly and telling people that they should assign a random generator to some value if they need repeatability.
|
||
private: | ||
mutable E m_engine; | ||
mutable D m_dist; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which non-const method is required on the distribution?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the distributions can have an internal state, so calling operator()
can modify them.
struct random_impl | ||
{ | ||
using value_type = V; | ||
using size_type = std::size_t; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a styling thing: I would add using
statements for distribution_type
, strides_type
, engine_type
.
If anyone has a clue as to why the windows build is failing, I'd be glad :) |
OK, I was successful with a static array. More thorough review coming. |
xarray<double> p2 = n_dist; | ||
xarray<double> p3 = n_dist; | ||
ASSERT_EQ(p1, p2); | ||
ASSERT_EQ(p1, p3); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(FYI) ASSERT_EQ expects the expected value first.
618505d
to
c33ddd6
Compare
@wolfv could you add the new header |
I have drastically simplified the random_impl now. Let me know what you think. |
It seems that you are missing the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like how this looks. I left a couple of minor comments.
return mt; | ||
} | ||
|
||
inline void set_seed(seed_type seed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest we call this seed
like in numpy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
E& engine = random::get_default_random_engine()) | ||
{ | ||
std::uniform_real_distribution<T> dist(lower, upper); | ||
return detail::make_xgenerator(detail::random_impl<T>(std::bind(dist, std::ref(engine))), shape); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
std::normal_distribution<T> dist(mean, std_dev); | ||
return detail::make_xgenerator(detail::random_impl<T>(std::bind(dist, std::ref(engine))), shape); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since in numpy, rand
, randn
and randint
are in the random namespace, we could do the same..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there was a brief discussion with @JohanMabille about the new namespace. I think he was against introducing more namespaces. Personally, I'd like to follow numpy here, and add a random namespace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, I would like to avoid the multiplication of namespaces just for a few functions; unless we plan to develop a lot of random features...
Ok this PR looks perfect to me, I think we can merge it ! |
I still think that a random namespace would be better. We can make that change later if before the next release. |
🎉 cool! I think If we'd have them, would that justify having the |
Yes, if this module is meant to grow as you mention, it deserves its own namespace. |
Done in a commit to master where I also fixed the docs generation by working around the rtfd bug readthedocs/readthedocs.org#2566 by installing breathe from pip. |
As discussed in the random xarrays PR, this version utilizes the new xgenerator :)
Unfortunately, resetting the random number engine seems to work only once, which really puzzles me. If you got any insights why the test is failing I'd be happy to hear it.