Skip to content

Analysis

Francesco Lattanzio edited this page Jun 24, 2015 · 4 revisions

The problem

A few months ago I was asked to customize a Free Software application. As you may guess it's a Java application -- a big one split into 50 or so components (JAR archives and a few native libraries).

The build system of choice is Maven. When I started I didn't know anything about Maven, so I read all I could find to learn how to use it effectively.

Well, using Maven effectively proved to be a difficult task to achieve. Maybe it's me, but I think that the following issues are particularly obnoxious:

  • no incremental compilation -- when I modify a single source file it irritates me to wait more than a hour to compile the whole project just to find out I've misplaced a comma. Yes, I could only compile that single component and then all the components that depend on it, in order of dependencies, but given that any decent build system can do this itself, why should it be me doing that?
  • multiple versions of the same library -- imagine your project depends on some external libraries, here Maven does a nice job downloading them for you. But what happen if those dependencies depends themselves, directly or indirectly, on different version of the same library? If you create your packages using Maven plugins, you would get them all. It happened to me -- I had both log4j version 1.2.14 and 1.2.17 in the package. This makes no sense.
  • XML -- I find writing deeply nested XML difficult, even impossible given that most plugins' documentation is unclear about the purpose of the plugin and/or how they do their job. OK, there are IDEs with nice forms to fill-in, but then you can do only what the form has fields for (which seldom is all the plugin can do) and still without decent documentation you don't know what's going on under the hood.

To solve the above issues I decided to revert to the build system I'm best accustomed with: GNU Make.

However using GNU Make for Java projects is not straightforward. In the rest of this page we'll see why, whereas in the next page we'll see how the problem was solved.

The Java Compiler

Two are the aspects of javac that make using GNU Make for Java projects problematic.

The first is the way javac compiles source files: for every source file passed on the command line, it will compile both the file itself and all its dependencies, recursively.

Imagine the following scenario:

  • two source files: A.java and B.java
  • A.java depends on B.java, e.g., has some member variable of type B
  • B.java depends only on system libraries or other libraries specified in the classpath

Running javac A.java would compile both A.java and B.java, although we only specified A.java.

Moreover, if we run javac A.java again (without modifying A.java nor B.java), even if the class files are placed where javac can find them, they would be both compiled again.

Recent javac releases (1.6 to 1.8) implement the -implicit:none option that can disable this behaviour -- running javac -implicit:none A.java would compile only A.java. Note, however, that only bytecode generation is disabled -- B.java would still be parsed and analyzed and errors found in B.java will abort the compilation.

The second aspect of javac adding to the problem is its "start-up time".

As many Java application out there, starting the VM requires quite a lot of time when compared to native applications. For example, on the netbook I'm using to write these words, running javac -help requires about 1200 ms, whereas running gcc --help requires about 20 ms.

However, once started javac can compile thousands of files in a few seconds.

To understand why these aspects are problematic, we need to known how GNU Make works.

GNU Make

make is a powerful dependencies tracking tool. You may think of it as a "rule processor".

A rule tells when and how to build a "target" file from a number of "prerequisite" files running a set of shell commands. It looks like:

target: prerequisite1 prerequisite2 ...
        command1
        command2
        ...

target is the file to be built, command1, command2, etc. are the commands to execute to build target and prerequisite1, prerequisite2, etc. are files the commands require to generate target. The whole set of commands of a rule is also known as the "recipe".

When make process a rule, it will first check if any of the prerequisites should be rebuilt -- a prerequisites can be rebuilt if it is also the target of some rule:

  • if it is, make suspends the current rule and starts processing the new one -- when it is done with the new rule, it proceeds checking the next prerequisite
  • else, it will verify if the prerequisite exists: if it doesn't, execution terminates with error, else make proceeds checking the next prerequisite

When all the prerequisites are checked, make compares the target's timestamp with prerequisites' timestamps -- if the former is older than any of the latter or if the target does not exist, the commands of the recipe are executed is sequence.

Note that it is not an error if the commands of the recipe do not update nor create the target file, but it is an error if a prerequisite is not already there nor can be created. Whether the target file is created/updated or not, no rule will be processed twice, no matter how many rules the target is a prerequite of.

There is more to the way make works -- this is just the minimum we need to understand how make can help us in building Java projects. If you want to learn more (and you have to if you want to use GNU Make in your next Java project), just run info 'GNU Make' and read.

What is important to note about make, is that every target is treated individually, even when a single "wildcard" rule is used to build multiple targets. This fact is the key to incremental compilation: a target (and that target only) is rebuilt if and only if the files it depends on change.

The make-javac clash

Now we put together what we know about javac and make.

Let's take again the hypothetical files A.java and B.java and write a naive Makefile:

foo.jar: A.class B.class
        jar cvf $@ $^

B.class: B.java
        javac B.java

A.class: A.java
        javac A.java

This may look like a perfectly correct and working Makefile and if we run make we can actually build foo.jar, given there are no errors in the Java source files. However there are a few non-obvious flaws.

First, when make compiles A.class, it is really compiling both A.class and B.class (remember the "first aspect" of javac?), so if B.java is compiled first it will be compiled twice. Whether B.java is compiled first or not depends on the order make chooses to process prerequisites. Although it's possible to alter this order moving around foo.jar's prerequisites, one should not rely on it as nowhere in the GNU Make documentation is stated the expected order of evaluation of prerequisites. A better, although non-optimal, solution is instead to run javac once: we won't stop it from compiling referenced classes unnecessarily, however we'll limit the number of unnecessary compilation to at most once.

The second flaw is actually a matter of performance -- a few classes is not a problem, but imagine a big project with thousand of files. Given that javac requires no less than 1.2 seconds to compile (on my netbook) a class file, compiling a thousand class files would last for no less than 20 minutes -- that's too much (well, to me, others seem to be comfortable with such long times). Invoking javac once -- with a list of the thousand classes to be compiled -- would solve this flaw too.

For the third and last flaw, imagine to modify B.java such that the constructor (or any other method) invoked from A.java is no more valid and to run make again -- it won't report any error, although it should. This happens for only B.class is older than its own Java source file, so javac A.java is not executed and the error remains undetected.

To fix this flaw we need to tell make that whenever B.java is modified A.class must be recompiled too. This is done adding B.java among A.class's prerequisites:

foo.jar: A.class B.class
        jar cvf $@ $^

B.class: B.java
        javac B.java

A.class: A.java B.java
        javac A.java

Running make again will reveal the error.

Adding inter-classes dependencies in a project of a few files can be done manually, but with a project of a thousand files some automatism is mandatory.

CC0
To the extent possible under law, Francesco Lattanzio has waived all copyright and related or neighboring rights to A Makefile for Java projects. This work is published from: Italy.

Clone this wiki locally