Update MarkovChain with `replicate` method and Future Numba Improvements #146

mmcky · 2015-05-05T03:16:47Z

Now, I believe that it would be useful to have the function doing the actual simulaion run in nopython entirely, so that one can loop around it. Is it really useful to keep the ability to supply init, either as a point or as a distribution ? Eventually it gets converted into a scalar index, which seems easy to do outside of the function.

From an api perspective, what about a function like:

simulate_markov_chain(P, init, sample_size, n_exp=None) which would simulate a markov chain

characterized by matrix P over sample_size periods and:
perform one single simulation starting at init if init is an integer and n_exp is None
simulate many draws if init is a vector (len(init) draws) or n_exp is an integer (n_exp draws)

From @jstac

@albop "Is the reason why you want to speed up the code, because you are simulating over a very long period ? Or is it because, you want to run many simulations ? (like 100000)"

This routine is orientated to the former. We could also add one for the latter. In that case vectorization would probably play a bigger role.

From @oyamad

Motivated by a comment by @albop I was thinking of changing the API, for example by giving MarkovChain a method replicate in addition to simulate, as in LinearStateSpace.

oyamad · 2015-05-09T15:04:26Z

I have pushed 95627f2 for discussion on this matter.

First of all, I think cdfs should be stored once computed, avoiding repeating computation each time. It is now stored as MarkovChain.cdfs.
I made some changes to have the methods of MarkovChain look similar or consistent with the corresponding methods of LinearStateSpace.
1. I added the replicate method. It returns num_reps observations of the state at time T.
2. I changed the order and the names of the arguments of simulate to be consistent with LinearStateSpace.simulate. I don't know how much benefit there is by this (the cost is obvious).
init now does not accept a distribution.
Having written up the code, I started thinking the proposal by @albop would be much better. Namely, design simulate as simulate(ts_length, init=None, num_reps=None), which would return one sample path of length ts_length by default, and if init and/or num_reps is not None, return len(init) or num_reps sample paths (and discard replicate). A concern in this case is that the returned array, which is of shape (num_reps, ts_length), could become huge.
I haven't written tests.

albop · 2015-05-13T19:17:30Z

@oyamad : I'd like to have a detailed look at the code but I need a bit of time for this as I am completely under the water right now.
As for point 4, I don't get exactly what is the problem with having a huge output, since it is precisely what the user will want if he calls the routine. Maybe it is actually a case where one would want to use the numba random generator, in order to avoid preallocating a big array of random numbers that is not reused (and is of the same size of the output)

oyamad · 2015-06-06T07:07:01Z

I updated the dev_mc_tools branch.

The simulate method has been changed in line with @albop 's suggestion, as simulate(ts_length, init=None, num_reps=None). Here's how it works:

>>> import quantecon as qe
>>> mc = qe.MarkovChain([[0.4, 0.6], [0.2, 0.8]], random_state=1234)
>>> mc.simulate(10)
array([1, 0, 1, 1, 1, 1, 1, 1, 1, 1])
>>> mc.simulate(10, init=0)
array([0, 0, 1, 1, 1, 1, 1, 1, 1, 1])
>>> mc.simulate(10, init=[0, 1])
array([[0, 0, 1, 1, 1, 1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1, 1, 1, 1, 1, 0]])
>>> mc.simulate(10, num_reps=3)
array([[0, 0, 1, 1, 1, 1, 1, 1, 1, 1],
       [0, 1, 1, 1, 1, 1, 1, 1, 1, 0],
       [1, 1, 1, 1, 1, 0, 0, 1, 1, 1]])
>>> mc.simulate(10, init=0, num_reps=3)
array([[0, 0, 1, 1, 1, 1, 1, 1, 1, 1],
       [0, 1, 1, 1, 1, 1, 1, 1, 1, 0],
       [0, 1, 1, 1, 1, 0, 0, 1, 1, 1]])
>>> # If init is an array, num_reps is ignored
... mc.simulate(10, init=[0, 1], num_reps=3)
array([[0, 0, 1, 1, 1, 1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1, 1, 1, 1, 1, 0]])

The replicate method deleted.
random_state option added.
Tests with random_state added.

albop · 2015-06-29T17:25:43Z

@oyamad : sorry for the long silence. I have read the whole module and overall, I like it. I have only very minor points:

line 318 should probably have init_states = random_state.randint(self.n, size=k) so as to produce replicable results
I am not completely sold by the fact that a random state is attached to a markov chain. It seems a bit artificial to me. For instance, if I do sim1 = mc.simulate() and sim2.simulate() would I really expect to get the same result ? If I want to run another simulation, then shall I do mc.random_state = ... (in whch case it would make sense to create a property setter for that) or mc2 = MarkovChain(mc) (which would require init to be an involutive). To me it would feel more natural to always do mc.simulate(, random_state=...).
there is some overlap between the mc.simulate method and the mc_sample_path one. They depart from each other by naming conventions (ts_length vs. sample_size) and the interpretation given to the init parameter (initial distribution vs. points). Regarding this last point, I wonder whether it could be unified based on a typecheck (int vs float) and the property that distributions sum to one. Then one of the two methods would just be a call to the other one.
if we think we find a better name than mc_sample_path for the standalone version, then everything can be merged in the current state without worrying about backward compatibility
it seems slightly strange that the more numba compliant method is implemented in object mode and the backup version as an independent method. Shouldn't it be the opposite ? Performance wise it makes no difference as long as the standalone function cannot be compiled. But that may change in the future, since numba 0.19 introduced the ability to allocate memory in nopython mode.

albop · 2015-06-29T17:27:37Z

:s/involutive/idempotent/g

oyamad · 2015-06-30T06:20:53Z

@albop Thank you for your comments!

Re: random_state: Right, thanks. (Corrected 2a75fc4.)
Your point sounds convincing. What do you think? > @mmcky
What should we expect then if we do mc.simulate(, random_state=1234) and then mc.simulate()? Maybe the latter the same as mc.simulate(, random_state=None)?
Good point on init. It should be doable.
But actually I am not quite convinced that we really need to accept a distribution for init, even for mc_sample_path. If the Markov chain is irreducible and aperiodic, in the long run the state is almost distributed according to the (unique) stationary distribution, so if one ignores the first few periods, then it is as if the initial distribution was the stationary distribution.
One may have no reason to set a specific initial state and in that case may want to give one randomly. Then the uniform distribution should be enough for that randomness, and it is what mc.simulate(init=None) does in the current implementation.
If you are uncertain whether the chain is irreducible or aperiodic, then you may run mc.simulate(init=np.arange(mc.n)), or just do mc.is_irreducible and mc.is_aperiodic.

Could you elaborate a bit more on the last two points?

albop · 2015-07-01T03:29:45Z

On point 2/ , I would tend to expect that mc.simulate(, random_state=1234) would not affect the stateful random generator and that simulate() or simulate(random_state=None) would take the seed from the current state. But in the latter case case, I am not sure whether we should commit to any particular behaviour so I don't have strong feelings there.

albop · 2015-07-01T03:33:06Z

On point 3/ I confess that I don't know in which case you may want to use a distribution for init... But since the feature was there, I guessed there had been a reason for it at some point and that it was not expensive to keep. Maybe there are some usecases in the lectures ? @jstac ?

oyamad · 2015-07-01T04:45:00Z

@albop
In your 4th point, are you proposing to set up a new function, say simulate_markov_chain, that is equivalent to mc.simulate(), while leaving the current mc_sample_path as is for backward compatibility?

And in your 5th point, are you saying that we should implement simulate_markov_chain as the main function and let mc.simulate() just call simulate_markov_chain(mc.P, ...)?

The reason why I coded mc.simulate() as the main routine is that I wanted to store, once computed, the CDF version of transition probabilities in the class instance (as the property mc.cdfs), not to do the same computation many times.

albop · 2015-07-01T16:51:36Z

Yes, exactly, that's what I meant for points 4 and 5. The bottom-line is that I am quite happy with the object version as it is. Potential reasons to implement a non-object version would be:
1/ backward compatibility (the current version of mc_sample_path)
2/ deal with initial distributions (but then it could be integrated in the object call instead)
3/ stylistic (in which case a one-liner would do the job: def simulate_markov_chain(mc, **kwargs): MarkovChain(mc).simulate(**kwargs)
4/ get a full numba version

Although they are easy to implement, reasons 1, 2, and 3 seem rather weak. The reason I was proposing 3, was that maybe we could the API right before adding the numba version later.
So far 4. is not achievable unless the arguments are simplified a bit (as numba doesn't support keywords yet). As you mentioned, calling 4/ in the object method has a (very small?) cost as it would recompute cdfs many times. Also in case it is needed for heavy numerical work, it is relatively straightforward to write such an independent version, and we can do that later.

So, let me apologize for the circumvolutions, I now think we should just keep the object as it is and deprecate/delete mc_sample_path while keeping the namespace free for a possible future simulate_markov_chain.

oyamad · 2015-07-02T13:47:52Z

We also need to care about the consistency with the Julia version. Do you guys have any thoughts on this matter? > @spencerlyon2 @ZacCranko

sglyon · 2015-07-02T13:51:12Z

I do want to revamp the interface on the Julia side. My plan was to wait until this (and QuantEcon/QuantEcon.jl#32) settles and then match it if it seems appropriate there.

I do have one request:

I think that we should only export functions as methods on the MarkovChain type. The reason for this is that many of the algorithms we implement assume the matrix is a well-formed stochastic matrix (square, all entries non-negative, rows sum to 1). Having users access these methods via the type or class will allow us to make sure these invariants are always imposed automatically.

based on discussion #146 (comment)

mmcky · 2015-08-25T15:12:48Z

@oyamad apologies just read this thread now.

Your point sounds convincing. What do you think? > @mmcky
What should we expect then if we do mc.simulate(, random_state=1234) and then mc.simulate()?    Maybe the latter the same as mc.simulate(, random_state=None)?

I would expect if a user wanted a reproducible state that it would be specified rather than set permanently as part of the object. So I agree with your point above that it would return mc.simulate(, random_state=None) - either way we should document whatever behaviour clearly.

oyamad · 2015-08-25T15:18:43Z

@mmcky No problem. The change has been proposed as part of PR #166 (a80203f).

based on discussion #146 (comment)

oyamad · 2015-08-27T07:55:36Z

The new simulate API has been merged as PR #166.

Finally it goes like:

>>> from quantecon.markov import MarkovChain
>>> mc = MarkovChain([[0.4, 0.6], [0.2, 0.8]])
>>> seed = 1234
>>> mc.simulate(10, random_state=seed)
array([1, 1, 1, 1, 1, 1, 0, 0, 1, 0])
>>> mc.simulate(10, init=0, random_state=seed)
array([0, 0, 1, 1, 1, 1, 1, 1, 1, 1])
>>> mc.simulate(10, init=[0, 1], random_state=seed)
array([[0, 0, 1, 1, 1, 1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1, 1, 1, 1, 1, 0]])
>>> mc.simulate(10, num_reps=3, random_state=seed)
array([[1, 1, 1, 1, 1, 0, 0, 1, 0, 0],
       [1, 0, 1, 1, 1, 0, 0, 1, 0, 0],
       [0, 0, 1, 1, 1, 0, 0, 1, 1, 1]])
>>> mc.simulate(10, init=0, num_reps=3, random_state=seed)
array([[0, 0, 1, 1, 1, 1, 1, 1, 1, 1],
       [0, 1, 1, 1, 1, 1, 1, 1, 1, 0],
       [0, 1, 1, 1, 1, 0, 0, 1, 1, 1]])
>>> mc.simulate(10, init=[0, 1], num_reps=3, random_state=seed)
array([[0, 0, 1, 1, 1, 1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1, 1, 1, 1, 1, 0],
       [0, 1, 1, 1, 1, 0, 0, 1, 1, 1],
       [1, 1, 1, 1, 1, 1, 1, 0, 1, 1],
       [0, 0, 1, 1, 1, 0, 0, 0, 1, 1],
       [1, 1, 0, 1, 1, 1, 0, 1, 1, 0]])

mmcky added enhancement performance labels May 5, 2015

mmcky mentioned this issue May 5, 2015

Numba improvements #144

Merged

oyamad added a commit that referenced this issue Aug 8, 2015

random_state moved from __init__ to simulate

7213e7b

based on discussion #146 (comment)

oyamad mentioned this issue Aug 8, 2015

MarkovChain.simulate API change; random_state option added #166

Merged

oyamad added a commit that referenced this issue Aug 25, 2015

random_state moved from __init__ to simulate

a80203f

based on discussion #146 (comment)

oyamad added a commit that referenced this issue Aug 26, 2015

random_state moved from __init__ to simulate

83ebebf

based on discussion #146 (comment)

oyamad closed this as completed Aug 27, 2015

oyamad mentioned this issue Nov 5, 2015

MarkovChain simulate: Match with python version QuantEcon/QuantEcon.jl#77

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update MarkovChain with `replicate` method and Future Numba Improvements #146

Update MarkovChain with `replicate` method and Future Numba Improvements #146

mmcky commented May 5, 2015

oyamad commented May 9, 2015

albop commented May 13, 2015

oyamad commented Jun 6, 2015

albop commented Jun 29, 2015

albop commented Jun 29, 2015

oyamad commented Jun 30, 2015

albop commented Jul 1, 2015

albop commented Jul 1, 2015

oyamad commented Jul 1, 2015

albop commented Jul 1, 2015

oyamad commented Jul 2, 2015

sglyon commented Jul 2, 2015

mmcky commented Aug 25, 2015

oyamad commented Aug 25, 2015

oyamad commented Aug 27, 2015

Update MarkovChain with replicate method and Future Numba Improvements #146

Update MarkovChain with replicate method and Future Numba Improvements #146

Comments

mmcky commented May 5, 2015

oyamad commented May 9, 2015

albop commented May 13, 2015

oyamad commented Jun 6, 2015

albop commented Jun 29, 2015

albop commented Jun 29, 2015

oyamad commented Jun 30, 2015

albop commented Jul 1, 2015

albop commented Jul 1, 2015

oyamad commented Jul 1, 2015

albop commented Jul 1, 2015

oyamad commented Jul 2, 2015

sglyon commented Jul 2, 2015

mmcky commented Aug 25, 2015

oyamad commented Aug 25, 2015

oyamad commented Aug 27, 2015

Update MarkovChain with `replicate` method and Future Numba Improvements #146

Update MarkovChain with `replicate` method and Future Numba Improvements #146