Skip to content
This repository was archived by the owner on Dec 5, 2024. It is now read-only.

Variables refactor #136

Merged
merged 35 commits into from
May 2, 2022

Conversation

antoine-dedieu
Copy link
Contributor

@antoine-dedieu antoine-dedieu commented Apr 13, 2022

We update the way of representing variables. In particular:

  • We get rid of variables names, as welll as of the Variables and CompositeVariableGroup classes. A variable is now represented by a tuple (variable hash, variable num_states)
    In particular, a FactorGraph can then directly be instantiated asfg = graph.FactorGraph(variables=[hidden_variables, visible_variables])
    Similarly, Factors are defined by directly passing the variables involved, as [hidden_variables[ii], visible_variables[jj]]
  • We rewrite NDVariableArray so that the user can access variables by relying on the use of numpy arrays. We also optimize some follow-up computations.

@antoine-dedieu
Copy link
Contributor Author

antoine-dedieu commented Apr 13, 2022

@StannisZhou I have pushed a first sketch for this PR. Let's discuss it next week

Here are some observation:

  • with the way variable names are now created in NDVariableArray , we do not need to iterate over the array indices as before (plus we do not create the variables arrays). This will result in important speedups
  • we do not have the CompositeVariableGroup class anymore: theNDVariableArray of a FactorGraph are then represented as a dict where the values are the NDVariableArray and the keys are the hash of these.
  • the following examples run fine: RBM, PMP, test_or, test_wiring

(Personal note: we cannot use a set to represent the variables in the graph as we will get TypeError: '<' not supported between instances of 'NDVariableArray' and 'NDVariableArray' in update_evidence when we try to jit it)

Here are some questions / follow-up we should discuss:

  • should we delete VariableDict and VariableGroup? We would just have NDVariableArray which seems enough to me, but I may lack context
  • do we want NDVariableArray to support different number of states? As a note, this would affect the flattening operations of the class.
  • can the same variable be involved in 2 factors without having the factors touching at each other?
  • one of the bottleneck in creating a FactorGraph is to create the variable_names_for_factors list. This step is currently slow as it loops through the individual NDVariableArray arrays. However, now that this class relies on numpy arrays, we can speed up this step a lot proposing a generic get_factors interface where the user would define the general rule for the factors and the corresponding list would be generated with numba. For example get_factors({x:(i, j), y:(k, l)}, {z:(i+k, j+l)}) would mean
factors = []
for i in range(x.shape[0]):
    for j in range(x.shape[1]):
        for k in range(y.shape[0]):
            for l in range(y.shape[1]): 
                factors.append((x[i, j], y[k, l], z[i+k, j+l]))

@codecov-commenter
Copy link

codecov-commenter commented Apr 22, 2022

Codecov Report

Merging #136 (b265f14) into master (58fbe95) will not change coverage.
The diff coverage is 100.00%.

@@            Coverage Diff            @@
##            master      #136   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           13        13           
  Lines          967       916   -51     
=========================================
- Hits           967       916   -51     
Impacted Files Coverage Δ
pgmax/factors/enumeration.py 100.00% <100.00%> (ø)
pgmax/factors/logical.py 100.00% <100.00%> (ø)
pgmax/fg/graph.py 100.00% <100.00%> (ø)
pgmax/fg/groups.py 100.00% <100.00%> (ø)
pgmax/fg/nodes.py 100.00% <100.00%> (ø)
pgmax/groups/enumeration.py 100.00% <100.00%> (ø)
pgmax/groups/logical.py 100.00% <100.00%> (ø)
pgmax/groups/variables.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 58fbe95...b265f14. Read the comment docs.

@StannisZhou StannisZhou self-requested a review April 22, 2022 05:23
Copy link
Contributor

@StannisZhou StannisZhou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments, but looks pretty good on a high level.

One major thing (something I didn't realize before) is that it looks like with the refactors we no longer need to implement the current add_factor/add_factor_by_type/add_factor_group. Instead, we can instantiate those outside the factor graph and have a generic add_factor that takes as input the constructed factors/factor groups directly. This is the benefit of using a more intuitive way to represent variables (instead of relying on names like before which requires access to a variable_group object that's only available in a factor graph).

@antoine-dedieu
Copy link
Contributor Author

Thanks for your detailed review @StannisZhou!
I have updated the process of adding factors, we should now have almost all the pieces we need

Copy link
Contributor

@StannisZhou StannisZhou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made another round of comments. Major issues are:

  1. We should consider alternative ways for implementing variable group hashes. And checking for duplicate variables is probably not necessary if we implement our hash right.
  2. Hash for VariableDict seems wrong
  3. Remove name for factors/factor groups, and unify add_factor/add_factor_group
  4. Properly implement flatten/unflatten for variable number of states in NDVariableArray (quite straightforward)

@antoine-dedieu antoine-dedieu marked this pull request as ready for review April 27, 2022 01:55
@antoine-dedieu
Copy link
Contributor Author

antoine-dedieu commented Apr 27, 2022

@StannisZhou the PR is ready to be reviewed!

Here is an approximate timing comparisons from before / after on two examples

RBM after:

  • variables +fg: 0.001s
  • creating variables_to_factors 1.5s
  • creating factors: 1.2s (.3s of computing factor_edges_num_states / .6s of checking size of log potentials for each group)
  • adding factors: 1s (.6s of looking for factors with same variables)
  • wiring: 1s

RBM before:

  • variables + fg < 0.01s
  • adding factors: 4s
  • wiring: 1s

PMP after:

  • variables: 0.001s
  • fg: .8s (mainly vars_to_starts)
  • creating variables_to_factors 4s
  • factors: 1.5s
  • wiring: 1.5s

PMP before:

  • variables: 2.6s
  • fg: 4.3s
  • factors: 3.5s (0.9s for variable_names_to_factors)
  • wiring: 1.2s

Copy link
Contributor

@StannisZhou StannisZhou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partial review. Will finish the rest tomorrow...

Copy link
Contributor

@StannisZhou StannisZhou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More comments

@antoine-dedieu antoine-dedieu changed the title WIP - Variables refactor Variables refactor Apr 27, 2022
Copy link
Contributor

@StannisZhou StannisZhou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for patiently addressing all the comments!

@antoine-dedieu antoine-dedieu merged commit dfc7535 into vicariousinc:master May 2, 2022
@antoine-dedieu antoine-dedieu deleted the variables_refactor branch May 2, 2022 19:03
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
4 participants