Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xgenerator for random arrays #76

Merged
merged 10 commits into from
Jan 17, 2017

Conversation

wolfv
Copy link
Member

@wolfv wolfv commented Jan 12, 2017

As discussed in the random xarrays PR, this version utilizes the new xgenerator :)

Unfortunately, resetting the random number engine seems to work only once, which really puzzles me. If you got any insights why the test is failing I'd be happy to hear it.

@wolfv wolfv force-pushed the xgenerator_random branch from 356bdc4 to d15637d Compare January 13, 2017 13:18
return m_dist(m_engine);
}
}
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you detail what this logic does.
I am confused by why you would need strides.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, so my basic thinking was that a random number generator puts out a "fixed" sequence of random numbers.
And then one could go reset and advance the RNG as one pleases, basically the same as having linearly laid out data in a vector. That's what I was using the strides for, to find the correct "advance" position of the RNG.
And that part is working fine.

Unfortunately, distributions can also carry an internal state (that can be resetted as well). But, afaik, distributions don't offer an advance operation. So the () operator would have to be called advance times in order to arrive at the correct random number.

This will make random access that doesn't linearly count upwards kinda slow, as there will be a bunch of reset/advance ... cycles.

Is there a better option?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for clarifying. I have not thought about it yet. I will be AFK most of the afternoon, but will have a more thorough look later.

Copy link
Member

@SylvainCorlay SylvainCorlay Jan 15, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering if we are not overthinking this. If instead, the operator() always generated a new value, we would not have to pay the cost of discarding values.

Then, we would overload operator == so that random expressions are always different to any other expressions except those of the same type with the same internal state.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm... good point.

Maybe you're right, and if random values should be "fixed" then it just has to be evaluated and stored in some xarray or something.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like @JohanMabille to chime in before we make a decision.

Copy link
Member

@JohanMabille JohanMabille Jan 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should keep things simple. We hardly need to reproduce the same sequence of random numbers; and when that's the case, as far as I know, these numbers are generally precomputed and stored in files (so you can reproduce a scenario / use case many days after).

That would also significantly improve performance.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just wondering if people might think it's strange if the same xfunction evaluates to different results.

Like if you have something like:

auto a = 2 + randn({3, 5}, 0, 2);
double b = a(0, 0) - a(0, 0); // != 0 ... 

but I guess it's a matter of documenting it correctly and telling people that they should assign a random generator to some value if they need repeatability.


private:
mutable E m_engine;
mutable D m_dist;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which non-const method is required on the distribution?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the distributions can have an internal state, so calling operator() can modify them.

struct random_impl
{
using value_type = V;
using size_type = std::size_t;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a styling thing: I would add using statements for distribution_type, strides_type, engine_type.

@wolfv
Copy link
Member Author

wolfv commented Jan 15, 2017

If anyone has a clue as to why the windows build is failing, I'd be glad :)

@SylvainCorlay
Copy link
Member

OK, I was successful with a static array. More thorough review coming.

xarray<double> p2 = n_dist;
xarray<double> p3 = n_dist;
ASSERT_EQ(p1, p2);
ASSERT_EQ(p1, p3);
Copy link
Member

@SylvainCorlay SylvainCorlay Jan 15, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(FYI) ASSERT_EQ expects the expected value first.

@wolfv wolfv force-pushed the xgenerator_random branch from 618505d to c33ddd6 Compare January 15, 2017 13:25
@JohanMabille
Copy link
Member

@wolfv could you add the new header xrandom.hpp to the header variable in the CMakeList.txt file (at the root directory, not the one in test directory) and fix randn documentation ?

@wolfv
Copy link
Member Author

wolfv commented Jan 16, 2017

I have drastically simplified the random_impl now. Let me know what you think.

@SylvainCorlay
Copy link
Member

It seems that you are missing the <functional> include in xgenerator, which causes the windows build fail.

Copy link
Member

@SylvainCorlay SylvainCorlay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like how this looks. I left a couple of minor comments.

return mt;
}

inline void set_seed(seed_type seed)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest we call this seed like in numpy.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

E& engine = random::get_default_random_engine())
{
std::uniform_real_distribution<T> dist(lower, upper);
return detail::make_xgenerator(detail::random_impl<T>(std::bind(dist, std::ref(engine))), shape);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

std::normal_distribution<T> dist(mean, std_dev);
return detail::make_xgenerator(detail::random_impl<T>(std::bind(dist, std::ref(engine))), shape);
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since in numpy, rand, randn and randint are in the random namespace, we could do the same..

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there was a brief discussion with @JohanMabille about the new namespace. I think he was against introducing more namespaces. Personally, I'd like to follow numpy here, and add a random namespace.

Copy link
Member

@JohanMabille JohanMabille Jan 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, I would like to avoid the multiplication of namespaces just for a few functions; unless we plan to develop a lot of random features...

@JohanMabille
Copy link
Member

Ok this PR looks perfect to me, I think we can merge it !

@SylvainCorlay SylvainCorlay merged commit a2bffc3 into xtensor-stack:master Jan 17, 2017
@SylvainCorlay
Copy link
Member

I still think that a random namespace would be better. We can make that change later if before the next release.

@wolfv
Copy link
Member Author

wolfv commented Jan 17, 2017

🎉 cool!

I think choice, permutation and shuffle are some other cool functions from numpy that could be implemented rather easily.

If we'd have them, would that justify having the xt::random namespace? :)

@JohanMabille
Copy link
Member

JohanMabille commented Jan 17, 2017

Yes, if this module is meant to grow as you mention, it deserves its own namespace.

@SylvainCorlay
Copy link
Member

Done in a commit to master where I also fixed the docs generation by working around the rtfd bug readthedocs/readthedocs.org#2566 by installing breathe from pip.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants