Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Adding more detailed info to GraphViz renderings #3956

Closed
wants to merge 20 commits into from

Conversation

Spaak
Copy link
Member

@Spaak Spaak commented Jun 12, 2020

Note: this PR is a work in progress, and some brief feedback would be welcome before I add some more commits to finalize the functionality (if it's desirable at all, of course).

I very much like the GraphViz rendering functionality, but have noticed that complex/bigger models often result in extremely wide graphs with very wide, horizontally elongated, nodes. The first very simple thing that this PR does is it places the distribution a new line in the node text ("MyVar ~ Normal" --> "MyVar\n~\nNormal"), which results in more evenly distributed graphs (nodes considerably less wide).

The second, more elaborate thing, this PR does is it adds a include_prior_params option to the model_to_graphviz functionality. The idea is that this would enable display of not just e.g. "MyVar ~ Normal", but "MyVar ~ Normal(mu=1, sigma=0)" in the graph nodes. To me at least this would be useful functionality. See this simple rendering:

image

As of now, the functionality supports Normal and Uniform, but others are trivial to add (and I'll add before this PR should be merged). But before I do that, I had two questions:

  1. I couldn't find any unified mechanism by which Distribution subclasses advertise their parameters. Many distributions have a _repr_latex() method which I had a look at, and saw that each subclass hard-codes its parameters of interest there. I didn't want to introduce a general mechanism by which distributions advertise their parameters (as this seems a somewhat major overhaul), so instead I opted to add a dict to ModelGraph that maps distributions to their parameters of interest. For now I just added Normal and Uniform, but if you agree this is the way to go I'll add the rest (just a bit of monkey work--happy to do it but won't if this is inappropriate of course ;) ).

  2. I noticed (and am using) the util.get_variable_name function, which is somewhat strangely named since it actually returns a variable's value, or a description of that value. I think the general logic of the function is highly relevant to what this PR is trying to do, but it has some LaTeX hardcoded in it (only '\text{%s}' at the end). Perhaps that's OK to remove?

@Spaak Spaak marked this pull request as draft June 12, 2020 14:22
@ColCarroll
Copy link
Member

thanks for the contribution! i'll try to take a look at this during the weekend. some brief thoughts in the meantime:

  • it sounds like a small change to make the nodes skinnier would be easy to agree to and merge. i might consider breaking that off and submitting separately
  • sharing machinery with the repr_latex would be strongly preferred over having two different code paths. I didn't dig through your (helpful!) description of the problems, but I wouldn't be surprised if one or more PRs polishing the repr_latex code was needed.

@Spaak
Copy link
Member Author

Spaak commented Jun 12, 2020

Thanks for having a look so quickly already.

  • it sounds like a small change to make the nodes skinnier would be easy to agree to and merge. i might consider breaking that off and submitting separately

OK, will do so. It will literally be a 1-2 character PR, but I guess that's OK then! Edit: done, #3957.

  • sharing machinery with the repr_latex would be strongly preferred over having two different code paths.

I agree. Distributions don't really have a sensible __str__ implemented anyway right now, they have default string reprs like <pymc3.distributions.continuous.Normal object at 0x7fa9ac662f60>. The only semantically informed string repr is the latex one.

How about: (1) move functionality currently in the various _repr_latex methods into a new one _str_repr with kwarg format values 'latex', 'txt'; rewrite _repr_latex to call the new _str_repr(format='latex'), and add a __str__ to call _str_repr(format='txt')? Then the GraphViz code can simply use the default string representation of distributions. (Though of course this will affect all string representations also in shell windows etc.--I think it's an improvement, but maybe I'm missing something.)

@ColCarroll
Copy link
Member

That sounds like it would work. That'd be neat to have a nice __str__ (plus then terminal users can have a printout). Is the thought to share machinery in _str_repr for getting a distribution name and parameters?

@ColCarroll
Copy link
Member

(hopefully the rebase here isn't too messy -- the code looks good and compact enough that it shouldn't be...)

@twiecki
Copy link
Member

twiecki commented Jul 27, 2020

@Spaak Any time for the rebase so we can merge this in?

@Spaak
Copy link
Member Author

Spaak commented Jul 27, 2020 via email

@twiecki
Copy link
Member

twiecki commented Jul 27, 2020 via email

@Spaak Spaak force-pushed the more-graph-details branch from 8ac8bc1 to c6d5cdf Compare August 10, 2020 13:15
@codecov
Copy link

codecov bot commented Aug 10, 2020

Codecov Report

Merging #3956 into master will decrease coverage by 0.06%.
The diff coverage is 52.17%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3956      +/-   ##
==========================================
- Coverage   86.79%   86.73%   -0.07%     
==========================================
  Files          88       88              
  Lines       14143    14158      +15     
==========================================
+ Hits        12276    12280       +4     
- Misses       1867     1878      +11     
Impacted Files Coverage Δ
pymc3/model_graph.py 82.46% <50.00%> (-6.03%) ⬇️
pymc3/util.py 93.75% <100.00%> (ø)

@twiecki
Copy link
Member

twiecki commented Aug 10, 2020

@Spaak You also need to rebase as we fixed some things for CI on master.

@Spaak
Copy link
Member Author

Spaak commented Aug 10, 2020

@twiecki I'm still adding some functionality (refactoring the _repr_latex_ machinery to remove some (a lot) of duplication), so this PR is not quite ready for merging yet (should be in a day or two, will mark as ready for review).

You also need to rebase as we fixed some things for CI on master.

I rebased my branch onto current master already, and did a force push after that. That should take care it, right? If not, let me know. (I'll rebase again, if needed, after the functionality here is complete and ready for merge.)

@Spaak Spaak changed the title Adding more detailed info to GraphViz renderings [WIP] Adding more detailed info to GraphViz renderings Aug 10, 2020
@Spaak
Copy link
Member Author

Spaak commented Aug 11, 2020

I think I might need some input @twiecki @ColCarroll . First of all, as you can see this PR grew substantially and now is much more about streamlining general string representations of variables and distributions.

This has two significant advantages:

  1. Meaningful string representations can now be used in various places, like the command line. E.g. print(mydist) will now give something like mydist ~ Normal(mu=1.0, sigma=1.0). These string representations can also be used when rendering to GraphViz (the original motivation for the PR).
  2. A serious reduction in code duplication for the many _repr_latex_ methods, these are now all handled by a single implementation in Distribution. The functionality is largely shared between plain string and LaTeX representations, to further avoid duplication.

The changes to achieve the above are all now already in this PR.

When the tests started running, I noticed that existing PyMC3 code relies on str(var) in several places to retrieve the name of a variable. Unfortunately, my changes broke this functionality, since str(mu) now returns the more informative mu ~ Normal(mu=1.0, sigma=1.0), rather than just mu (which can be traced back to TensorVariable.__str__). In principle, I think that my use of __str__ (or str) to generate meaningful, user-facing, strings, is a correct one; would you agree? If so, then actually relying on str(var) to retrieve the name of a variable might be problematic. I think this is relatively easy to fix, namely by replacing those cases of str(var) with something like get_variable_name(var) (to be implemented in util). But, since this usage seems fairly widespread (in samplers, trace backends, etc.), I wanted to hear your input first.

This PR actually already has a function util::plain_str:
https://github.com/pymc-devs/pymc3/blob/273f1b6c0eaba696f4ba3e6e580a44bbcd42132a/pymc3/util.py#L236 that can serve as a drop-in replacement for str in these cases, although I now think that get_variable_name would be a more appropriate name.

Apologies for letting this PR get out of hand a bit, I do think it's a potential improvement to the code though; hope you agree ;)

@Spaak
Copy link
Member Author

Spaak commented Aug 24, 2020

@twiecki @ColCarroll I'm going to split this up into several PRs that depend on each other (i.e. I'll file PR n after n-1 is merged). It'll probably be 3PRs along the following lines:

  1. Refactor _repr_latex functionality
  2. Implement semantically meaningful __str__ functionality
  3. Update GraphViz functionality to use new string output

Hope that works and will make it a bit easier to review!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants