-
-
Notifications
You must be signed in to change notification settings - Fork 199
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Faster Expectations and new DiscreteDistributionXRA class #1156
Conversation
clean up distribution
because DD expects lists
use new `Expected` with underlying new `distribution.calc_expectations` method; in my own testing this cuts down time by 2/3 for a full infinite horizon model
HARK/distribution.py
Outdated
""" | ||
return self._xarray.attrs | ||
|
||
def expected_value(self, func=None, *args, labels=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The relevant docs on argument ordering are here:
https://docs.python.org/3/tutorial/controlflow.html?highlight=arguments%20style%20positional%20default#special-parameters
|
||
wrapped = dict(zip(idx, x)) | ||
|
||
return func(wrapped, *args) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to talk to you about what you are doing with the 'labels' case here before merging.
You asked if there is a simpler way to do what you're trying to do, and I suspect there is.
But the documentation doesn't make it clear to me what's happening.
HARK/distribution.py
Outdated
coords : dict | ||
Coordinate values/names for each dimension of the underlying array. | ||
dims : tuple or list | ||
Dimension names for each dimension of the underlying array. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sorry to be getting to a thorough review of this work so late.
What I am noticing here is that the arguments to this DiscreteDistributionXRA class are exposing the configuration of the underlying DataArray data structure: dims
, coords
. But the documentation is not providing guidance as to the semantics of these arguments with respect to the class's purpose as a representation of a probability distribution.
Since you seem to be using these values mainly in the labels
version of the expected_value
operation, it leaves me with a lot of questions about the design here. I wonder if there's a clearer way to do what you're trying to do, or a way to describe this capability that doesn't rely so much on a deep understanding of the DataArray format.
It looks like the usage pattern you are intending here is for a multivariate distribution over A and B, you will have 'dims = ['rv', 'x']' where 'x' refers to... the probability mass? And then coords= {'rv' : ['A','B']}
. The DataArray then stores a values indexed by ((a, b), x) which represent the probability of (A = a, B = b)... but then what is the 'x' dimension?
And if this is the intended usage pattern, would you ever want dims
to be anything other than ['rv', 'x']
? Would that be a reasonable default?
But I'm actually now quite confused about the data model. Why wouldn't you just have the dims = ['A', 'B']
? That way each index of (a,b) would map to a probability value. Am I missing something?
HARK/distribution.py
Outdated
Wrapper function for `func` that handles labeled indexing. | ||
""" | ||
dim_0 = self.dims[0] | ||
idx = self.coords[dim_0].values |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code assumes a particular structure of dims
and coords
.
That structure should be guaranteed by the constructor of the class, but currently it is wide open to user configuration.
It's easy to make a 'valid' (in terms of what documented) which breaks at this point in the code.
Looking this over... it looks like currently in the DiscreteDistribution (and the new DiscreteDistributionXRA by extension), the class is initialized with a This takes len(X) + len(X) * N space, where len(X) is determined by the approximation value. In our case, N-VariateNormal().approx(D) expands into D^N different points. So: (1 + N) D ^ N. I would think that one of the benefits of using xarrays would be to use the dimension labels in a more concise way. I.e., storing all the data of the distribution in a single N-dimensional array, where the value of This takes D ^ N space, which is less. The only downside I can think of is if when approximating a distribution there are some methods that would lead to a very uneven discretization in which many (a,b, ...) values were null. But that could be solved with xarray sparsity, which seems to be supported. I was wondering if you'd thought about this while working on this @alanlujan91 . It's quite possible I'm forgetting some important rationale for why things are the way they are, but it's probably more likely that I was just trying to extract the functionality from the older HARK code without breaking things, and a new design would be better. |
It's interesting to me that the |
About the 'labels' way of calling a function... The current syntax for this call is (using the alias):
Especially since we want to move away from this rather odd construct of
... and likewise for when this is called via |
@alanlujan91 and I talked about this at some length today. You're right that it is closely connected with the issues around the whole steps/stages/frames direction. Also closely connected to some of the things I worked on last year in my pre-2.0-alpha. A key goal there was to have a structure from which it would be possible to extract the key equations of a model in a predictable and mathematical-looking way, and where the same exact equations could be applied in the solution and simulation stages. In the end the only way I found to do that was to encapsulate the relevant python code in a string. Then the strings associated with transitions could be printed, and the unique string could be evaluated as a python expression both in the solution and the simulation stage. If we have a call on Fri, maybe the two of us could briefly discuss what Alan and I discussed today. The upshot, though, was a conclusion that maybe we should hold off on merging the xarray and expectation stuff until we are more confident about how we want to handle these things. |
Ok -- it makes sense to hold off on merging the xarray and 'local' function evaluation parts of this PR. The performance improvements to There are well known security issues with evaluating strings as Python code. @MridulS has commented on this before. |
@Mv77 Not sure why this is causing errors that weren't showing up before, I am guessing they have to do with |
Yuup. The release broke my interpolators. #1157 fixes them but let me push a quick standalone fix. Incoming! |
#1161 is a standalone fix. Its 3 lines of code. If it passes tests you can go ahead and merge it, which should fix the issue. |
This is a work in progress but I would like some feedback. This PR tackles a few related concepts.
DiscreteDistribution.calc_expectation(func, args)
method that usesnumpy
functional programming instead of looping. Significantly faster than older method.DiscreteDistribution.distr_of_function(func, args)
.Expected(func, dist, args)
which uses newDD.calc_exp
in the back end.DiscreteDistributionXRA
which uses underlyingxarray.DataArray
to allow for labeling distributions, dimensions, and coordinates.DiscreteDistributionXRA.calc_expectations(func, args, labels=True)
can use labeled and expressive equations that are easier to read as math.Expected
operator inConsIndShockModel
as proof of concept. In my tests this cuts down runtime by 2/3 (607 ms to 203 ms)I'm still missing some documentation and tests, but would appreciate feedback and wanted to see if this passes remote tests.