Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement cquery --output=graph #12248

Closed
wants to merge 4 commits into from

Conversation

gregestren
Copy link
Contributor

@gregestren gregestren commented Oct 9, 2020

Thankfully, query already has most of the infrastructure necessary to make this
easy.

query implements graph output (in GraphOutputFormatter) over a
Digraph<Target>, which is a generic graph data structure with Target payloads.
All output logic then runs over this data structure. To opt query in, all we have
to do is create an equivalent Digraph<ConfiguredTarget>, which is a simple
transformation from the backing graph.

This change creates a new generic class for that common logic:
GraphOutputWriter. query's GraphOutputFormatter then becomes a simple wrapper
over that, and the new GraphOutputFormatterCallback is cquery's equivalent.

A few differences:

  • cquery output is always fully ordered (--order_output=full). We could match
    this with query's controllable version, but I don't see a reason to make this
    yet another bit to configured.
  • query output annotates edges with select() conditions. cquery doesn't do this
    because select()s are resolved and removed from the graph after analysis. I
    think we could annotate edges with the chosen condition if there was
    demand, but that'd be a followup effort.

Fixes #10843 (for cquery, not aquery)

Thankfully, query already has most of the infrastructure necessary to make this
easy. query implements graph output (in GraphOutputFormatter) over a
Digraph<Target>, which is a generic graph data structure with Target payloads.
All output logic then runs over this data structure. To opt query in, all we have
to do is create an equivalent Digraph<ConfiguredTarget>, which is a simple
transformation from the backing graph.

This change creates a new generic class for that common logic:
GraphOutputWriter. query's GraphOutputFormatter then becomes a simple wrapper
over that, and the new GraphOutputFormatterCallback is cquery's equivalent.

A few differences:

 - cquery output is always fully ordered (--order_output=full). We could match
   this with query's controllable version, but I don't see a reason to make this
   yet another bit to configured.
 - query output annotates edges with select() conditions. cquery doesn't do this
   because select()s are resolved and removed from the graph after analysis. I
   think we could annotate edges with the *chosen* condition if there was
   demand, but that'd be a followup effort.

PiperOrigin-RevId: 336377123
Change-Id: Iea0802850d18f6b047f8f35a5aa51926b97289e5
@google-cla google-cla bot added the cla: yes label Oct 9, 2020
@gregestren gregestren self-assigned this Oct 9, 2020
@gregestren gregestren added the team-Configurability platforms, toolchains, cquery, select(), config transitions label Oct 9, 2020
@gregestren
Copy link
Contributor Author

@meisterT re: potential aquery integration.

@gregestren
Copy link
Contributor Author

gregestren commented Oct 9, 2020

TODO:

  • Verify streamed mode doesn't apply here (i.e. partialResults in the output formatters aren't actually partial results)
  • Add tests
  • Add docs

@meisterT meisterT requested a review from joeleba October 12, 2020 12:09
Copy link
Member

@joeleba joeleba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally LGTM. Re: aquery integration: conceptually it makes sense, but in actuality aquery outputs are humongous and I'm not sure a visual representation of that is useful. Maybe with very specific scopes specified by the filters.

///////////////////////////////////////////////////////////

@Option(
name = "graph:node_limit",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I would interpret node_limit as a limit to the number of nodes in the graph, but it's not the case here. Maybe node_string_limit? It's consistent with the variable name on L287.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This carries over query's --graph:node_limit flag into a common location. However clear or not the flag is, I'd prefer to keep the existing API for the context of this change.

@gregestren
Copy link
Contributor Author

Generally LGTM. Re: aquery integration: conceptually it makes sense, but in actuality aquery outputs are humongous and I'm not sure a visual representation of that is useful. Maybe with very specific scopes specified by the filters.

I figured as much. Thanks for clarifying. I'm personally happy just that you're aware this is a thing now.

Thankfully, query already has most of the infrastructure necessary to make this
easy. query implements graph output (in GraphOutputFormatter) over a
Digraph<Target>, which is a generic graph data structure with Target payloads.
All output logic then runs over this data structure. To opt query in, all we have
to do is create an equivalent Digraph<ConfiguredTarget>, which is a simple
transformation from the backing graph.

This change creates a new generic class for that common logic:
GraphOutputWriter. query's GraphOutputFormatter then becomes a simple wrapper
over that, and the new GraphOutputFormatterCallback is cquery's equivalent.

A few differences:

 - cquery output is always fully ordered (--order_output=full). We could match
   this with query's controllable version, but I don't see a reason to make this
   yet another bit to configured.
 - query output annotates edges with select() conditions. cquery doesn't do this
   because select()s are resolved and removed from the graph after analysis. I
   think we could annotate edges with the *chosen* condition if there was
   demand, but that'd be a followup effort.

PiperOrigin-RevId: 336377123
Change-Id: Iea0802850d18f6b047f8f35a5aa51926b97289e5
@gregestren
Copy link
Contributor Author

Added a long disclaimer to PostAnalysisQueryBuildTool in support of @haxorz 's suggestions to guarantee non-streaming mode.

Copy link
Contributor

@juliexxia juliexxia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM! Just minor nits/qs. Excited this is happening!

@@ -253,4 +270,30 @@ public AspectResolutionModeConverter() {
+ "precise mode is not completely precise: the decision whether to compute an aspect "
+ "is decided in the analysis phase, which is not run during 'bazel query'.")
public AspectResolver.Mode aspectDeps;

///////////////////////////////////////////////////////////
// GRAPH OUTPUT FORMATTER OPTIONS //
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the proto: options above also graph output formatter options?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not to my knowledge? As far as I can see those only apply to --output=proto

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I didn't quite read this as --output=graph output formatter options. makes sense.

import java.io.OutputStream;
import java.util.Comparator;

/** cquery output formatter that prints the result as factored graph in AT&amp;T GraphViz format. */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: just AT&T?

Copy link
Contributor Author

@gregestren gregestren Oct 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This inherits the same comment as in the query version, which I believe follows Javadoc's "Comments are written in HTML" guidance: https://docs.oracle.com/javase/1.5.0/docs/tooldocs/windows/javadoc.html#blockandinlinetags

Key snippet:

entities for the less-than (<) and greater-than (>) symbols should be written &lt; and &gt;. Likewise, the ampersand (&) should be written &amp;

private final GraphOutputWriter.NodeReader<ConfiguredTarget> nodeReader =
new NodeReader<ConfiguredTarget>() {

private final Comparator<ConfiguredTarget> configuredTargetOrdering =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my own edification - reason to initialize this outside of the comparator method?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My original reasoning is that the comparator method can be called multiple times whereas the actual comparator logic only needs to be defined once. So it theoretically saves unnecessary extra instantiation.

In practice I don't think that'd make a huge difference (and I'd hope the JDK could optimize that). So I don't have strong feelings on the subject.

for (ConfiguredTarget configuredTarget : partialResult) {
Node<ConfiguredTarget> node = graph.createNode(configuredTarget);
for (ConfiguredTarget dep : depsRetriever.getDirectDeps(configuredTarget)) {
if (allNodes.contains(dep)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming this is for a situation like --noimplicit_deps or the like?

Copy link
Contributor Author

@gregestren gregestren Oct 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reasoning here is that the query output may only contain a subset of all deps.

So if A depends on B, C, and D and some magical query expression filtered the results down to just A and C, depsRetriever.getDirectDeps returns all of A's deps (B, C, and D). We only want to include the ones that are part of the query result (any target in partialResult, which in this case is A and C).

@Override
public void beginVisit() {
super.beginVisit();
// TODO(bazel-team): (2009) make this the default in Digraph.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

laughing at the date of this TODO

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I definitely noticed that too. :p

@joeleba
Copy link
Member

joeleba commented Oct 13, 2020

Generally LGTM. Re: aquery integration: conceptually it makes sense, but in actuality aquery outputs are humongous and I'm not sure a visual representation of that is useful. Maybe with very specific scopes specified by the filters.

I figured as much. Thanks for clarifying. I'm personally happy just that you're aware this is a thing now.

Related: https://www.youtube.com/watch?v=GDbaBOCDwrQ

@gregestren
Copy link
Contributor Author

Generally LGTM. Re: aquery integration: conceptually it makes sense, but in actuality aquery outputs are humongous and I'm not sure a visual representation of that is useful. Maybe with very specific scopes specified by the filters.

I figured as much. Thanks for clarifying. I'm personally happy just that you're aware this is a thing now.

Related: https://www.youtube.com/watch?v=GDbaBOCDwrQ

That graph visualization is super-cool.

@bazel-io bazel-io closed this in 02cbcd2 Oct 19, 2020
@gregestren gregestren deleted the cquery_graph_output branch October 19, 2020 19:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes team-Configurability platforms, toolchains, cquery, select(), config transitions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Please implement output format 'graph' for cquery and aquery
4 participants