The framework is currently composed of the following:
- parsers for Python, Java, Verilog, Fortran, and C/C++,
- an AST differencing tool, Diff/AST, based on the parsers,
- helper scripts for factbase manipulation, and
- ontologies for the related entities.
The parsers and Diff/AST export resulting facts such as abstract syntax trees (ASTs), changes between them, and other syntactic/semantic information in XML or N-Triples. In particular, facts in N-Triples format are loaded into an RDF store such as Virtuoso to build a factbase or a database of facts. Factbases are intended to be queried for software engineering tasks such as code comprehension, debugging, change pattern mining, and code homology analysis.
Diff/AST is an experimental implementation of the AST differencing algorithm reported in the following paper:
Masatomo Hashimoto and Akira Mori, "Diff/TS: A Tool for Fine-Grained Structural Change Analysis," In Proc. 15th Working Conference on Reverse Engineering, 2008, pp. 279-288, DOI: 10.1109/WCRE.2008.44.
It compares ASTs node by node, while popular diff
tools compare any (text) files line by line.
The algorithm is based on an algorithm for computing tree edit distance (TED) between two ordered labeled trees. The TED between two trees is the minimum (weighted) number of edit operations to transform one tree into another.
Unfortunately, applying TED algorithms directly to wild ASTs is not feasible in general because their computational complexity is essentially, at best, quadratic according to the number of AST nodes.
Therefore Diff/TS makes moderate use of a TED algorithm in a divide-and-conquer manner backed by elaborated heuristics to approximate tree edit distances.
Nevertheless, Diff/AST still requires much time for non-trivial massive inputs. Thus it always caches the results.
You can see the results of comparing some pairs of source files taken from samples here.
You can instantly try Diff/AST by utilizing Docker and a ready-made container image.
$ docker pull codinuum/cca
The following command line executes Diff/AST within a container to compare sample Java programs and then saves the results in results
(host) directory.
$ ./cca.py diffast -c results samples/java/0/Test.java samples/java/1/Test.java
Once you have built DiffViewer, you can inspect the AST differences in a viewer window. See diffviewer/README.md
for details.
$ diffviewer/run.py -c results samples/java/0/Test.java samples/java/1/Test.java
You can run both Diff/AST and DiffViewer by the following line.
$ ./cca.py diffast -c results --view samples/java/0/Test.java samples/java/1/Test.java
The following will install parsesrc
and diffast
.
$ opam install cca
You can also build parsers and Diff/AST in person.
- GNU make
- OCaml (>=4.11.1)
- OPAM (for installing camlzip, cryptokit, csv, git-unix, menhir, ocamlnet, pxp, ulex, uuidm, and volt.)
The following create ast/analyzing/bin/{parsesrc.opt,diffast.opt}
.
$ cd src
$ make
They should be used via shell scripts ast/analyzing/bin/{parsesrc,diffast}
to set some environment variables.
If you have built Diff/AST, you can use it with Git. Add the following lines to your .gitconfig
. Note that PATH_TO_THIS_REPO
should be replaced by your local path to this repository.
[diff]
tool = diffast
[difftool]
prompt = false
[difftool "diffast"]
cmd = PATH_TO_THIS_REPO/git_ext_diff "$LOCAL" "$REMOTE"
[alias]
diffast = difftool
Then you should be able to use git diffast
like git diff
. You will be prompted to launch diffast for each source file comparison. Other file comparisons will be ignored.
The following command line creates a docker image named cca
. In the image, the framework is installed at /opt/cca
.
$ docker build -t cca .
Apache License, Version 2.0