Reconsider the graph object model #37

brandonwillard · 2020-09-11T04:07:14Z

The constant caching severely complicates graph manipulation/"optimization" by—for instance—requiring that such constants be cloned in order to be used within more than one FunctionGraph. This cloning requirement ultimately extends to the graphs that contain these cached constants and the result is an overly complicated and wasteful process of object cloning and clone-to-original map management (and maps to and from further clones, in some cases).

First, how much does this caching actually save? Was this ever measured or was it just applied? It's implied that this caching improves the performance of the MergeOptimizer; is that the only thing it helps? Is it simply attempting to save resources by only performing an is check instead of a potentially costly descent into subgraphs for further comparisons? What problem is the FunctionGraph trying to avoid by not allowing cached variables, and is that problem only avoided by completely cloning graphs?

All this touches upon some very fundamental graph object model choices/inconsistencies revolving around object identity and equality. Theano graph objects do not consistently implement __eq__ (well, most simply don't), so there are numerous "local" work-arounds throughout the codebase to determine object equality to varying degrees—including the MergeOptimizer itself. ~~Constant caching looks like just another example, since it's effectively assigning a singleton-like property to a subset of graph objects.~~

This—and many other issues and complexities—could be better addressed with a more consistent object model, and one that uses the built-in Python OOP features to implement it. (Especially since they were arguably designed for exactly these kinds of things.)

For instance, a base implementation of __eq__ is simple to provide, since—as usual—the core graph types can be mapped directly to S-expressions (see the Theano meta graph support in symbolic-pymc). Furthermore, if immutability is added to the model, hash generation and equality comparisons can be cached (and locally, by the the objects themselves).

Alternatively, unique IDs/names can be used to simplify identity and equality comparisons, with some severe limitations, though.

Anyway, we should start considering changes like this now, because, after they're in place, I believe we could start to make vast improvements in performance and general usability. Without them, we could end up facing the same problems over and over again in superficially different forms.

The text was updated successfully, but these errors were encountered:

brandonwillard added enhancement New feature or request question Further information is requested labels Sep 11, 2020

brandonwillard mentioned this issue Oct 4, 2020

Base core graph classes on attrs or dataclasses #72

Closed

brandonwillard mentioned this issue Oct 11, 2020

Remove constant caching #99

Closed

brandonwillard changed the title ~~Reconsider constant caching and the graph object model~~ Reconsider the graph object model Oct 15, 2020

brandonwillard mentioned this issue Nov 16, 2020

Implement Apply.__eq__ #171

Open

michaelosthege added this to the 2.0 milestone Dec 15, 2020

brandonwillard closed this as completed Apr 16, 2021

aesara-devs locked and limited conversation to collaborators Apr 16, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

This issue was moved to a discussion.

Reconsider the graph object model #37

Reconsider the graph object model #37

brandonwillard commented Sep 11, 2020 •

edited

Loading

This issue was moved to a discussion.

This issue was moved to a discussion.

Reconsider the graph object model #37

Reconsider the graph object model #37

Comments

brandonwillard commented Sep 11, 2020 • edited Loading

This issue was moved to a discussion.

brandonwillard commented Sep 11, 2020 •

edited

Loading