Variables refactor #136

antoine-dedieu · 2022-04-13T01:03:28Z

We update the way of representing variables. In particular:

We get rid of variables names, as welll as of the Variables and CompositeVariableGroup classes. A variable is now represented by a tuple (variable hash, variable num_states)
In particular, a FactorGraph can then directly be instantiated asfg = graph.FactorGraph(variables=[hidden_variables, visible_variables])
Similarly, Factors are defined by directly passing the variables involved, as [hidden_variables[ii], visible_variables[jj]]
We rewrite NDVariableArray so that the user can access variables by relying on the use of numpy arrays. We also optimize some follow-up computations.

antoine-dedieu · 2022-04-13T01:10:03Z

@StannisZhou I have pushed a first sketch for this PR. Let's discuss it next week

Here are some observation:

with the way variable names are now created in NDVariableArray , we do not need to iterate over the array indices as before (plus we do not create the variables arrays). This will result in important speedups
we do not have the CompositeVariableGroup class anymore: theNDVariableArray of a FactorGraph are then represented as a dict where the values are the NDVariableArray and the keys are the hash of these.
the following examples run fine: RBM, PMP, test_or, test_wiring

(Personal note: we cannot use a set to represent the variables in the graph as we will get TypeError: '<' not supported between instances of 'NDVariableArray' and 'NDVariableArray' in update_evidence when we try to jit it)

Here are some questions / follow-up we should discuss:

should we delete VariableDict and VariableGroup? We would just have NDVariableArray which seems enough to me, but I may lack context
do we want NDVariableArray to support different number of states? As a note, this would affect the flattening operations of the class.
can the same variable be involved in 2 factors without having the factors touching at each other?
one of the bottleneck in creating a FactorGraph is to create the variable_names_for_factors list. This step is currently slow as it loops through the individual NDVariableArray arrays. However, now that this class relies on numpy arrays, we can speed up this step a lot proposing a generic get_factors interface where the user would define the general rule for the factors and the corresponding list would be generated with numba. For example get_factors({x:(i, j), y:(k, l)}, {z:(i+k, j+l)}) would mean

factors = []
for i in range(x.shape[0]):
    for j in range(x.shape[1]):
        for k in range(y.shape[0]):
            for l in range(y.shape[1]): 
                factors.append((x[i, j], y[k, l], z[i+k, j+l]))

codecov-commenter · 2022-04-22T02:12:28Z

Codecov Report

Merging #136 (b265f14) into master (58fbe95) will not change coverage.
The diff coverage is 100.00%.

@@            Coverage Diff            @@
##            master      #136   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           13        13           
  Lines          967       916   -51     
=========================================
- Hits           967       916   -51

Impacted Files	Coverage Δ
pgmax/factors/enumeration.py	`100.00% <100.00%> (ø)`
pgmax/factors/logical.py	`100.00% <100.00%> (ø)`
pgmax/fg/graph.py	`100.00% <100.00%> (ø)`
pgmax/fg/groups.py	`100.00% <100.00%> (ø)`
pgmax/fg/nodes.py	`100.00% <100.00%> (ø)`
pgmax/groups/enumeration.py	`100.00% <100.00%> (ø)`
pgmax/groups/logical.py	`100.00% <100.00%> (ø)`
pgmax/groups/variables.py	`100.00% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 58fbe95...b265f14. Read the comment docs.

StannisZhou

Left some comments, but looks pretty good on a high level.

One major thing (something I didn't realize before) is that it looks like with the refactors we no longer need to implement the current add_factor/add_factor_by_type/add_factor_group. Instead, we can instantiate those outside the factor graph and have a generic add_factor that takes as input the constructed factors/factor groups directly. This is the benefit of using a more intuitive way to represent variables (instead of relying on names like before which requires access to a variable_group object that's only available in a factor graph).

examples/gmrf.py

examples/pmp_binary_deconvolution.py

pgmax/fg/nodes.py

pgmax/fg/groups.py

pgmax/fg/graph.py

antoine-dedieu · 2022-04-23T01:18:05Z

Thanks for your detailed review @StannisZhou!
I have updated the process of adding factors, we should now have almost all the pieces we need

StannisZhou

Made another round of comments. Major issues are:

We should consider alternative ways for implementing variable group hashes. And checking for duplicate variables is probably not necessary if we implement our hash right.
Hash for VariableDict seems wrong
Remove name for factors/factor groups, and unify add_factor/add_factor_group
Properly implement flatten/unflatten for variable number of states in NDVariableArray (quite straightforward)

examples/gmrf.py

examples/ising_model.py

examples/rbm.py

pgmax/fg/graph.py

antoine-dedieu · 2022-04-27T01:57:12Z

@StannisZhou the PR is ready to be reviewed!

Here is an approximate timing comparisons from before / after on two examples

RBM after:

variables +fg: 0.001s
creating variables_to_factors 1.5s
creating factors: 1.2s (.3s of computing factor_edges_num_states / .6s of checking size of log potentials for each group)
adding factors: 1s (.6s of looking for factors with same variables)
wiring: 1s

RBM before:

variables + fg < 0.01s
adding factors: 4s
wiring: 1s

PMP after:

variables: 0.001s
fg: .8s (mainly vars_to_starts)
creating variables_to_factors 4s
factors: 1.5s
wiring: 1.5s

PMP before:

variables: 2.6s
fg: 4.3s
factors: 3.5s (0.9s for variable_names_to_factors)
wiring: 1.2s

StannisZhou

Partial review. Will finish the rest tomorrow...

examples/ising_model.py

examples/pmp_binary_deconvolution.py

pgmax/groups/variables.py

StannisZhou

More comments

pgmax/fg/graph.py

pgmax/groups/variables.py

pgmax/fg/groups.py

StannisZhou

LGTM. Thanks for patiently addressing all the comments!

pgmax/fg/groups.py

pgmax/fg/graph.py

pgmax/fg/groups.py

antoine-dedieu added 4 commits April 12, 2022 18:01

Rewrite NDVariableArray

66bbcbc

Falke8

d816f63

Test + HashableDict

58b0115

Minbor

c8f2c5e

antoine-dedieu added 6 commits April 20, 2022 22:31

Variables as tuple + Remove BPOuputs/HashableDict

39b546a

Start tests + mypy

ef92d1b

Tests passing

7e55e5c

Variables

9a2dcd2

Tests + mypy

c6ae8d8

Some docstrings

5c8b381

StannisZhou self-requested a review April 22, 2022 05:23

StannisZhou reviewed Apr 22, 2022

View reviewed changes

antoine-dedieu added 4 commits April 22, 2022 19:24

Stannis first comments

40ec519

Remove add_factor

96c7fe3

Test

88f8e23

Docstring

033d176

antoine-dedieu added 4 commits April 25, 2022 21:18

Coverage

89767c8

Coverage

1ccfcf5

Coverage 100%

ecaab6c

Remove factor group names

cbd136b

StannisZhou reviewed Apr 25, 2022

View reviewed changes

antoine-dedieu added 2 commits April 25, 2022 23:05

Remove factor group names

22b604e

Modify hash + add_factors

87fcfd9

This was linked to issues Apr 26, 2022

Support for variable groups with differing numbers of outcomes #118

Closed

Creating NDVariableArray is slow #132

Closed

Refactor variables/variable groups #134

Closed

Stannis' comments

0f639fe

antoine-dedieu added 5 commits April 26, 2022 19:39

Flattent / unflattent

546b790

Unflatten with nan

35ce6c0

Speeding up

704ee3a

max size

aaa67ef

Understand timings

04c4d89

antoine-dedieu marked this pull request as ready for review April 27, 2022 01:55

StannisZhou reviewed Apr 27, 2022

View reviewed changes

Some comments

a276ce5

StannisZhou reviewed Apr 27, 2022

View reviewed changes

antoine-dedieu added 2 commits April 27, 2022 22:56

Comments

a839be6

Minor

05de350

antoine-dedieu changed the title ~~WIP - Variables refactor~~ Variables refactor Apr 27, 2022

antoine-dedieu added 3 commits April 27, 2022 23:50

Docstring

1e0c93d

Minor changes

63c6738

Doc

0397f9b

This was referenced Apr 30, 2022

Add option to update log potentials/messages for individual factors #137

Open

Globally switch from Tuple to tuple in type hints #138

Open

StannisZhou approved these changes Apr 30, 2022

View reviewed changes

pgmax/fg/groups.py Outdated Show resolved Hide resolved

pgmax/fg/graph.py Outdated Show resolved Hide resolved

pgmax/fg/groups.py Outdated Show resolved Hide resolved

StannisZhou and others added 3 commits April 29, 2022 22:51

Rename this_hash

8b3d60e

Final comments

8751e90

Minor

b265f14

antoine-dedieu merged commit dfc7535 into vicariousinc:master May 2, 2022

antoine-dedieu deleted the variables_refactor branch May 2, 2022 19:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Variables refactor #136

Variables refactor #136

antoine-dedieu commented Apr 13, 2022 •

edited

Loading

antoine-dedieu commented Apr 13, 2022 •

edited

Loading

codecov-commenter commented Apr 22, 2022 •

edited

Loading

StannisZhou left a comment

antoine-dedieu commented Apr 23, 2022

StannisZhou left a comment

antoine-dedieu commented Apr 27, 2022 •

edited

Loading

StannisZhou left a comment

StannisZhou left a comment

StannisZhou left a comment

Variables refactor #136

Variables refactor #136

Conversation

antoine-dedieu commented Apr 13, 2022 • edited Loading

antoine-dedieu commented Apr 13, 2022 • edited Loading

codecov-commenter commented Apr 22, 2022 • edited Loading

Codecov Report

StannisZhou left a comment

Choose a reason for hiding this comment

antoine-dedieu commented Apr 23, 2022

StannisZhou left a comment

Choose a reason for hiding this comment

antoine-dedieu commented Apr 27, 2022 • edited Loading

StannisZhou left a comment

Choose a reason for hiding this comment

StannisZhou left a comment

Choose a reason for hiding this comment

StannisZhou left a comment

Choose a reason for hiding this comment

antoine-dedieu commented Apr 13, 2022 •

edited

Loading

antoine-dedieu commented Apr 13, 2022 •

edited

Loading

codecov-commenter commented Apr 22, 2022 •

edited

Loading

antoine-dedieu commented Apr 27, 2022 •

edited

Loading