Faster Expectations and new DiscreteDistributionXRA class #1156

alanlujan91 · 2022-07-23T00:32:12Z

This is a work in progress but I would like some feedback. This PR tackles a few related concepts.

Create new DiscreteDistribution.calc_expectation(func, args) method that uses numpy functional programming instead of looping. Significantly faster than older method.
Ditto for DiscreteDistribution.distr_of_function(func, args).
Creates operator Expected(func, dist, args) which uses new DD.calc_exp in the back end.
Creates DiscreteDistributionXRA which uses underlying xarray.DataArray to allow for labeling distributions, dimensions, and coordinates.
DiscreteDistributionXRA.calc_expectations(func, args, labels=True) can use labeled and expressive equations that are easier to read as math.
Adds notebooks to demonstrate new functionality.
Expectations notebook
DiscreteDistributionXRA notebook
Uses Expected operator in ConsIndShockModel as proof of concept. In my tests this cuts down runtime by 2/3 (607 ms to 203 ms)

I'm still missing some documentation and tests, but would appreciate feedback and wanted to see if this passes remote tests.

Tests for new functionality/models or Tests to reproduce the bug-fix in code.
Updated documentation of features that add new functionality.
Update CHANGELOG.md with major/minor changes.

clean up distribution

because DD expects lists

use new `Expected` with underlying new `distribution.calc_expectations` method; in my own testing this cuts down time by 2/3 for a full infinite horizon model

sbenthall · 2022-08-03T13:50:42Z

HARK/distribution.py

+        """
+        return self._xarray.attrs
+
+    def expected_value(self, func=None, *args, labels=False):


The relevant docs on argument ordering are here:
https://docs.python.org/3/tutorial/controlflow.html?highlight=arguments%20style%20positional%20default#special-parameters

sbenthall · 2022-08-03T13:58:18Z

HARK/distribution.py

+
+            wrapped = dict(zip(idx, x))
+
+            return func(wrapped, *args)


I would like to talk to you about what you are doing with the 'labels' case here before merging.
You asked if there is a simpler way to do what you're trying to do, and I suspect there is.
But the documentation doesn't make it clear to me what's happening.

sbenthall · 2022-08-03T14:24:46Z

HARK/distribution.py

+    coords : dict
+        Coordinate values/names for each dimension of the underlying array.
+    dims : tuple or list
+        Dimension names for each dimension of the underlying array.


I'm sorry to be getting to a thorough review of this work so late.

What I am noticing here is that the arguments to this DiscreteDistributionXRA class are exposing the configuration of the underlying DataArray data structure: dims, coords. But the documentation is not providing guidance as to the semantics of these arguments with respect to the class's purpose as a representation of a probability distribution.

Since you seem to be using these values mainly in the labels version of the expected_value operation, it leaves me with a lot of questions about the design here. I wonder if there's a clearer way to do what you're trying to do, or a way to describe this capability that doesn't rely so much on a deep understanding of the DataArray format.

It looks like the usage pattern you are intending here is for a multivariate distribution over A and B, you will have 'dims = ['rv', 'x']' where 'x' refers to... the probability mass? And then coords= {'rv' : ['A','B']}. The DataArray then stores a values indexed by ((a, b), x) which represent the probability of (A = a, B = b)... but then what is the 'x' dimension?

And if this is the intended usage pattern, would you ever want dims to be anything other than ['rv', 'x']? Would that be a reasonable default?

But I'm actually now quite confused about the data model. Why wouldn't you just have the dims = ['A', 'B']? That way each index of (a,b) would map to a probability value. Am I missing something?

sbenthall · 2022-08-03T14:31:21Z

HARK/distribution.py

+            Wrapper function for `func` that handles labeled indexing.
+            """
+            dim_0 = self.dims[0]
+            idx = self.coords[dim_0].values


This code assumes a particular structure of dims and coords.
That structure should be guaranteed by the constructor of the class, but currently it is wide open to user configuration.
It's easy to make a 'valid' (in terms of what documented) which breaks at this point in the code.

sbenthall · 2022-08-03T14:54:33Z

Looking this over... it looks like currently in the DiscreteDistribution (and the new DiscreteDistributionXRA by extension), the class is initialized with a pmf (should really be pmv) and a X/data argument which for an N-dimensional multivariate is a 2 dimensional array, where each slice data[:, x] is a particular point with probability mass pmf[x] and values for all N variables.

This takes len(X) + len(X) * N space, where len(X) is determined by the approximation value. In our case, N-VariateNormal().approx(D) expands into D^N different points. So: (1 + N) D ^ N.

I would think that one of the benefits of using xarrays would be to use the dimension labels in a more concise way. I.e., storing all the data of the distribution in a single N-dimensional array, where the value of data[a,b] is the point mass of realizing (a,b).

This takes D ^ N space, which is less.

The only downside I can think of is if when approximating a distribution there are some methods that would lead to a very uneven discretization in which many (a,b, ...) values were null. But that could be solved with xarray sparsity, which seems to be supported.

I was wondering if you'd thought about this while working on this @alanlujan91 . It's quite possible I'm forgetting some important rationale for why things are the way they are, but it's probably more likely that I was just trying to extract the functionality from the older HARK code without breaking things, and a new design would be better.

sbenthall · 2022-08-03T14:56:56Z

It's interesting to me that the draw() method of the new XRA based class has not been updated -- so the Xarray doesn't seem to be used for operations per se, but rather as a way of storing information about how the multivariate probability data is structured?

sbenthall · 2022-08-03T15:11:17Z

About the 'labels' way of calling a function...

The current syntax for this call is (using the alias):

ExpectedValue(
    lambda x: 1 / x["perm_shk"] + x["tran_shk"],
    dist=x_dist,
    labels=True,
)

Especially since we want to move away from this rather odd construct of x standing for the state of nature (don't we?), I would expect a syntax more like this:

ExpectedValue(
    lambda perm_shk, tran_shk: 1 / perm_shk + tran_shk,
    dist=x_dist,
    labels=True,
)

... and likewise for when this is called via dist.expected_value

llorracc · 2022-08-04T00:08:18Z

@sbenthall,

@alanlujan91 and I talked about this at some length today. You're right that it is closely connected with the issues around the whole steps/stages/frames direction.

Also closely connected to some of the things I worked on last year in my pre-2.0-alpha.

A key goal there was to have a structure from which it would be possible to extract the key equations of a model in a predictable and mathematical-looking way, and where the same exact equations could be applied in the solution and simulation stages.

In the end the only way I found to do that was to encapsulate the relevant python code in a string. Then the strings associated with transitions could be printed, and the unique string could be evaluated as a python expression both in the solution and the simulation stage.

If we have a call on Fri, maybe the two of us could briefly discuss what Alan and I discussed today.

The upshot, though, was a conclusion that maybe we should hold off on merging the xarray and expectation stuff until we are more confident about how we want to handle these things.

sbenthall · 2022-08-04T14:34:13Z

Ok -- it makes sense to hold off on merging the xarray and 'local' function evaluation parts of this PR.

The performance improvements to calc_expectations and some of the redesign to the interface of DiscreteDistribution (see #1159 ) could be split off into their own PRs and be most welcome as standalone changes.

There are well known security issues with evaluating strings as Python code. @MridulS has commented on this before.
Let's talk some more soon about this.

use Dataset

`combine_indep_dstns`

alanlujan91 · 2022-08-07T23:42:15Z

@Mv77 Not sure why this is causing errors that weren't showing up before, I am guessing they have to do with interpolation.py's new release? Do you have any guess?

Mv77 · 2022-08-08T02:06:01Z

Yuup.

The release broke my interpolators. #1157 fixes them but let me push a quick standalone fix.

Incoming!

Mv77 · 2022-08-08T02:18:05Z

@alanlujan91

#1161 is a standalone fix. Its 3 lines of code. If it passes tests you can go ahead and merge it, which should fix the issue.

alanlujan91 added 26 commits July 17, 2022 20:05

create new dstn.calc_expectation method

9a4c0f4

create new distr_of_function method

bc294fd

create new Expectation and E aliases

307d921

add Expectations demo notebook

3382711

create tests for new calc_expectations method

cf2ee5b

add xarray to requirements

dc24117

clean up test_distribution

d92612d

update Expectation

88adb57

create XRADiscreteDistribution

9d31b2a

add XRADD properties

c00f4c0

create calc_expectation with labels

331e653

Update distribution.py

c294b57

clean up distribution

create XRADD demo notebook

afc5103

Merge branch 'econ-ark:master' into cal_exp

dca6108

Update XRADiscreteDistribution.ipynb

48a916b

make pmf a list

21c3cbf

because DD expects lists

add test for XRADD.distr_of_function

7209b4b

update XRADD notebook

c09a610

remove calc_expectation_labels

66a1e03

add xarray option to combine_indep_dstns

806de20

update notebook with more examples

10c1e7e

update Expectation

d2a62ca

Update Expectation.ipynb

4594b04

rename objects from CDC meeting 7/22

afe0f6b

update notebooks

cfb9ef8

use new methods in ConsIndShockModel

acf13c9

use new `Expected` with underlying new `distribution.calc_expectations` method; in my own testing this cuts down time by 2/3 for a full infinite horizon model

alanlujan91 added Type: Enhancement Tag: New Tool Status: Review Needed TSC: Review Needed Item escalated to the TSC. labels Jul 23, 2022

sbenthall reviewed Aug 3, 2022

View reviewed changes

sbenthall removed automerge Ready-To-Merge Has been reviewed and when branch is updated and checks pass it should be merged labels Aug 3, 2022

alanlujan91 mentioned this pull request Aug 3, 2022

WIP Renames X and pmf #1159

Closed

3 tasks

sbenthall reviewed Aug 3, 2022

View reviewed changes

sbenthall removed Status: Review Needed TSC: Review Needed Item escalated to the TSC. labels Aug 3, 2022

alanlujan91 mentioned this pull request Aug 4, 2022

Fast expectations #1160

Closed

3 tasks

alanlujan91 marked this pull request as draft August 4, 2022 14:47

alanlujan91 added 6 commits August 7, 2022 19:09

reorganize DDXRA

305a91f

use Dataset

remove changes to method

ff65a17

`combine_indep_dstns`

rename ExpectedValue to expected

f2be06d

rename expected and use dataset structure

b7086ec

rename ExpectedValue as expected

5dcbb81

update notebooks

eadc7c4

alanlujan91 marked this pull request as ready for review August 7, 2022 23:11

Merge branch 'econ-ark:master' into cal_exp

6f544eb

llorracc merged commit b344850 into econ-ark:master Aug 8, 2022

sbenthall added this to the 0.13.0 milestone Jan 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Faster Expectations and new DiscreteDistributionXRA class #1156

Faster Expectations and new DiscreteDistributionXRA class #1156

alanlujan91 commented Jul 23, 2022 •

edited

Loading

sbenthall Aug 3, 2022

sbenthall Aug 3, 2022

sbenthall Aug 3, 2022

sbenthall Aug 3, 2022

sbenthall commented Aug 3, 2022

sbenthall commented Aug 3, 2022

sbenthall commented Aug 3, 2022

llorracc commented Aug 4, 2022

sbenthall commented Aug 4, 2022

alanlujan91 commented Aug 7, 2022

Mv77 commented Aug 8, 2022

Mv77 commented Aug 8, 2022

Faster Expectations and new DiscreteDistributionXRA class #1156

Faster Expectations and new DiscreteDistributionXRA class #1156

Conversation

alanlujan91 commented Jul 23, 2022 • edited Loading

sbenthall Aug 3, 2022

Choose a reason for hiding this comment

sbenthall Aug 3, 2022

Choose a reason for hiding this comment

sbenthall Aug 3, 2022

Choose a reason for hiding this comment

sbenthall Aug 3, 2022

Choose a reason for hiding this comment

sbenthall commented Aug 3, 2022

sbenthall commented Aug 3, 2022

sbenthall commented Aug 3, 2022

llorracc commented Aug 4, 2022

sbenthall commented Aug 4, 2022

alanlujan91 commented Aug 7, 2022

Mv77 commented Aug 8, 2022

Mv77 commented Aug 8, 2022

alanlujan91 commented Jul 23, 2022 •

edited

Loading