Exporting project's dependency graph #20242
Replies: 6 comments 15 replies
-
Getting good graph introspection is nice! This seems quite similar to the |
Beta Was this translation helpful? Give feedback.
-
Have you considered adding a new output format option to the |
Beta Was this translation helpful? Give feedback.
-
This data exists (along with a lot of other data) in |
Beta Was this translation helpful? Give feedback.
-
What is the use case for providing transitive info? I would expect that a simple adjacency list is enough for graph visualization etc? |
Beta Was this translation helpful? Give feedback.
-
If a new goal, I think |
Beta Was this translation helpful? Give feedback.
-
Let's settle down with the following implementation plan: Add
More supported formats will be added later. The |
Beta Was this translation helpful? Give feedback.
-
Introduction
Pants already provides a rich set of tools to query the dependency graph to find out information about dependencies and dependents of individual build targets (be it a file, a package, or a glob pattern):
dependencies
goal lets you list the dependencies:dependents
(formerly known asdependees
) goal lets you list the dependents (also known as reverse dependencies):It is possible to list dependencies for multiple files, one after another, e.g.
Motivation
Having the dependencies listed for multiple targets such as individual source files, you don't know what modules out of those files in the
cheeseshop/repository
package depends on what.Running Pants goal on each individual file is very inefficient: each invocation of Pants has an overhead, so it's more preferrable to get all the work done within a single Pants call. It is also possible that a command would be run in a environment without
pantsd
process already running and/or any cache available. So even though this works, it will prove to be unreasonably slow for even a medium sized codebase:It is therefore more helpful to list dependencies for multiple files individually to be able to distinguish them, using a new goal when construction of the graph happens only once:
The information produced by this new goal would return adjacency representation of the dependency graph as a dictionary of lists. The output is JSON compatible which makes it trivial to filter and query the graph using standard tooling such as
jq
and standard library of most programming languages.More importantly, this data structure may be used to construct graphs using 3rd party tooling such as
networkx
to be able to query and manipulate it, see networkx.convert.from_dict_of_lists:Having the dependency graph exported makes it possible to cheaply answer a variety of useful questions such as:
Having the graph exported also opens up the opportunity to visualize the whole graph or its parts using visualization libraries such as
graphviz
:Having the graph exported into a JSON data structure is enough to be able to perform any query/manipulation with the graph, but for practical reasons, it may be helpful to provide additional functionality available out-of-the-box to avoid forcing users to write additional programs. This could mean:
Implementation
Practically, fetching dependencies (direct or transitive) is trivial:
and so is fetching dependents:
Fetching dependencies for multiple targets is likely to happen in a
MultiGet
call to a rule, filling a mapping of build targets and their dependencies which will be the output of the new goal.With the naming of the goal and the options being subject to change, this is how the user interface may look like:
Existing implementations
For comparison, the dependency graph export functionality is available in Bazel via the
query
command; see Display a graph of the result to learn more. The graph can be exported into a variety of formats such as DOT file or XML. Even though the DOT files can be loaded into a graph data structure, for instance, using networkx.drawing.nx_pydot.from_pydot, it may be preferrable to have the graph available in JSON to make it more accessible to standard operating system tooling as it's a lot easier to process JSON than DOT files.Buck2 implements a
query
command with a similar functionality to export the dependency graph by listing all paths between nodes in the dependency graph or dependencies of individual targets using the query command with the output format set to either JSON or DOT. Supporting both formats in the new Pants goal may be desired as well as it removes the burden from the user to convert the data. However, supporting initially only JSON shall be considered reasonable.Proof of concept
There's a plugin written for Pants 2.16 a few months ago and it has been used (with only minor adjustments to accommodate corporate needs) in production since then. See the source code and the published PyPI wheel to install it in a Pants 2.16 repository. Check out this PR's branch to experiment locally.
Beta Was this translation helpful? Give feedback.
All reactions