-
Notifications
You must be signed in to change notification settings - Fork 460
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Declare inputs and outputs to support up-to-date checking #31
Comments
The input is the source code, and the output is also the exact same source code. Can gradle handle this case? |
I'm only talking about the check task. The update task cannot be made incremental, for exactly the reason you said. |
My largest project has 6.87MB of java source, spread across 1636 files (in many subprojects, of course :). It takes 5.6 seconds to sum their filesizes with this code in my buildscript:
Running
Takes 38 seconds with a fresh daemon, 16 seconds on the second run, and levels out around 14s. Given that it takes 6 seconds just to sum their filesizes, I think adding incremental support can, at best, take ~8-10 seconds off of a very large check task, which is about half. It'll be much much faster than any reasonable set of unit tests on a codebase this size. @oehme How does gradle determine up-to-date? Does it trust filesize+lastModified, some kind of inotify, or take a hash? The hardest part will be serializing custom input functions. In the example above, if the user changes the |
That code you showed is too simplistic. Gradle can up-to-date check projects with a 100000 source files in about 5s. We have performance tests for that. Also, in the future Gradle will even do this pre-emptively (through file system listeners), so that it already knows the up-to-date state before you even run the next build. The problem is that the spotless check task will run even if the source code for a project did not change. Imagine you only changed one of your projects, but spotless still insists on re-running on every project. That's a big waste of time. There really is no good reason not to do up-to-date checking ;) You can hash custom functions by hashing their byte code (which you can get from their ClassLoader). |
I just saw that spotless adds a task to the root project which checks the complete source tree. This is so much unnecessary work on incremental builds. That calls for a redesign towards much more granular tasks. The plugin should add a task for every sourceSet. If the user wants to check stuff outside of the sourceSets, he can add another check task that only checks some specific files (let's say, the build.gradle files in his Basically the extension defines how to check (including the file extension, but not the full path). The task decides what to check. |
It doesn't add a task to the root project, just the project. You can add it to the root project (I often do), or you can add it to each subproject independently if you prefer.
It is true that, by default, it creates one I don't understand why one task with 500 files would be faster than two tasks with 250 files each, with or without incremental build support, so I don't understand why this causes unnecessary work. But if it does, and we need to redesign, I'm open to that :)
Imagine you want to calcualte the sum of 1000 numbers, and rather than caching each incremental sum, you just recalculate the whole sum when any number changes. Supporting incremental processing is only a win if the processing is more expensive than tracking the changes. Spotless is very simple, and so it is very fast - the bottleneck is disk IO. Until we can measure that something is a bottleneck, and we can measure that we have sped it up, I don't think we can justify adding complexity. I'm open to the idea that my benchmark is naive, but I don't see how yet. My machine is an old laptop with a magnetic disk, and it takes 6 seconds to look at the metadata of 1600 files, and 14 seconds to read them from disk, format them, and check that they are clean. If your build server has an SSD that can read the metadata for 100,000 files in 5 seconds, maybe it's also powerful enough to format them in 10 seconds?
That's a clever idea, I like it! When gradle implements filesystem listeners, I could maaaaybe see that it might be worth the complexity of implementing this feature. But for now, I don't understand why users would experience a faster build, because I don't understand how gradle can check for up-to-date in less time than my naive benchmark. |
Sure, the format would be specified in the extension, which all tasks reuse. But each sourceSet should have its own task. When I only change my test code, I don't want to re-check the main code. It's a waste of time.
You are thinking way too much in terms of full builds. Gradle is all about incremental builds. When I change a single source file in a single project, especially with continuous build, I expect a response in a few hundred milliseconds, not several seconds or more.
I won't get into an argument here, just test it yourself and you'll be suprised how much faster up-to-date checking is than you think
I did that on a 256MB tree containing 100000 source files.
As I showed you this is trivial to measure. Also, you are talking as if adding incremental build support was some monumental task. For a simple plugin like this, it's a weekend project at best. |
Running spotless on that same filetree takes 13s when I change a single file. That's 10 times longer! Imagine how much faster it would be if it only checked that single file. Gradle makes that trivial. The config was very simplistic, so this gets worse as you add more rules:
|
We don't have to imagine, we can calculate it ;-) It would be ~12s seconds faster on a 256MB source tree. On a 26MB source tree it would presumably be ~1.2 seconds faster. The biggest source tree I use in my humble world is just 7MB, though my laptop seems to be a lot crummier than yours ;-) If spotless is consuming 10% of a build's duration, then a 10x speedup in spotless is a 9% build speedup. I'd be happy to merge a PR. To me, this data confirms that spotless isn't slow enough for this optimization to make it to the top of my weekend project priority queue, at least not on a "utility" basis. But it would be a fun project to learn about or demonstrate Gradle's incremental APIs! If anyone is interested in doing this project for that purpose, go for it! After thinking about the custom function issue, it doesn't bother me that much anymore - it's fine if changing a custom function doesn't cause |
I corrected the number above. It is 10s slower or in other words 10 times as slow. And remember this is for a trivial set of replacements. I'm sure it's way worse if you add more. I'll give you some more numbers later. |
Currently, this is the only task in the project, FormatTask. If anybody has time and need to make the change, I'm happy to merge and release the PR. Clone this repo, run I'd guess that making this change might require creating a separate |
I will be happy to assist the contributor with this. It will definitely require a 3.0 to do this right. We should fix some other non-idiomatic Gradle usages while doing this. For instance the formats DSL and how the plugin creates tasks from it is not very user friendly. There are lots of special characters needed like quotes, commas and parentheses. It just looks very different from other Gradle APIs. Also, the task is not available except using We should track that in its own issue, but assign it to the same 3.0 milestone. @nedtwigg can you set that up? :) Here's my design for a Gradle-idiomatic DSL for spotless. The plugin would create a pair of check/apply tasks for every 'target'. This will make the plugin faster, more flexible and easier to extend.
|
(Unrelated aside: setting the I've created issue #32 to discuss your proposed DSL change. We can definitely implement up-to-date checking and incremental support without changing the DSL, they're almost unrelated. Even if we decide to break the existing DSL, we should first release a version which adds incremental support for people who can't immediately make the jump to Note to potential contributors: I will quickly merge and release any PR's which add up-to-date support or incremental build, but please don't break the DSL. |
We will need to break up the tasks into two different types, so that will be a breaking change. But yes, we don't need to change the Breaking up the DSL to create more fine grained tasks will make incremental builds even better though, that's why I mentioned it here. |
I'd be interested in looking into this. As I'm new to writing Gradle plugins, I'd have to study the incremental tasks docs and delve into the Spotless source code to wrap my head around things, so I'll let you know both know if I manage to produce something useful that could act as a good starting point for this. |
@jbduncan Cool! Note that the plugin currently scans the whole project dir, which includes stuff like the Gradle cache directory and the build directory. So you'll need to to apply at least some basic filters to its inputs before up-to-date checking will kick in. |
Hi @oehme, many thanks for your response! It's not clear to me how I should go about implementing these filters you suggested, so I wonder if you could point me towards some resources which I could use to learn more about them? |
Have a look at the user guide section on file trees :) |
Thanks @oehme! I've made an initial attempt at this in #45, but there's a few things which I'm confused about, and I wonder if you can help me. Firstly, I don't really know if I've understood the docs on incremental tasks, and so I don't know if I've used the annotations and Secondly, the tests no longer compile, but it's not clear to me what I should do to make them compile again (something about needing And finally, it's not clear to me how and where I should use |
I'll leave a few comments in the PR. Generally, the annotations are the first step, so the task can be up-to-date checked. Using
Generally I don't unit-test task classes, because they rely on quite a bit of Gradle infrastructure. Instead, I extract the processing logic into a separate class and unit-test that one. If you do want to unit-test them, then mocking the
I might have been a little unclear here. It does not trigger up-to-date checking. It's just that checking the whole project dir without a filter would include constantly changing things like the Gradle cache and the build directory. The problematic code is where the spotless plugin sets up the targets. This should have some default excludes (cache dir, build dir). |
It's not clear to me yet if I do want to unit-test the task classes, but it seems like the path of least resistance ATM, so I'll go down that route for now, and if I see a way of improving things later, I'll improve them then. :) |
I grabbed your code and responded in the PR. |
Hmm, I'm feeling rather lost now. I understand there's been further discussion at #45 and #47, but I've lost track of how it all fits together, and there's even some parts which my mind is just refusing to parse. Consequently I've not understood what it is I'm supposed to be doing now to move forward with #45. @nedtwigg @oehme Would one of you kindly give me a summary of what's been discussed and explain what I should do next? :) |
Just pulled your latest, tests pass, great work! The core wrinkle is that all FormatterSteps need to support Serializable/equals/hashCode, and we don't have a great way to do that for custom steps. We talked this through here and @oehme convinced me that his way was best. #47 is the one open wrinkle, we'll have to figure out what we think about it to completely close it out. Also, I just realized another hard part - LineEnding.Policy. By default, we use GitAttributes, which depends on every .gitattributes file in the repo, as well as system-wide config files. So we've gotta make LineEnding.Policy be Serializable, and figure out how to implement that for GitAttributes. I made some unrelated changes causing a conflict - I'm merging master into your branch and I'll upload the change and come back again with a to-do list we can work through together. |
Just uploaded the merge and some minor cleanup. We need to make all of the following things properly serializable/equalsTo/hashCode. Right now a lot of these are lazily evaluated, and a naive serializable implementation will make it more eager than is necessary. Maybe that's unavoidable, depending on how gradle handles the
Depending on what we decide on #47, we can either ignore these completely, or at least handle them piece-by-piece. We'll need some infrastructure for making the serialization easy, and for easily testing that the serialization is working. I've got some ideas for this, but I gotta run. I'll upload these ideas later today, feel free to upload other ideas too :) Based on how hard it is to actually do this, maybe we'll consider bailing on this approach and using the full shortcut presented by #47. |
I've uploaded a few commits. They have long commit messages that tell the story, but here's the outline:
Based on this, I think that making every built-in rule properly support up-to-date checking is fairly straightforward (though it's definitely a lot of work!). If it looks like a generally good idea to you guys, then I think we should merge this to a new branch |
You can use TestKit to verify that your tasks are executed/skipped as expected. Gradle evaluates inputs just before a task is executed, so no worries about eagerness. Also don't sweat about the Serializability. We don't need cross machine/cross version compatibility. This is just about local caching and if serialization breaks with a new version then Gradle will just reexecute the task. Literally just slapping |
FormatterSteps are independent of the files they are transforming. They are dependent on anything which determines their configuration. So EclipseFormatter.State needs to have the file that holds the formatter properties file, but it does not need the target files that it is transforming. For FreshMark, the key should be the Map<String, String> of properties that it is using. The FormatTask tracks which files are being transformed, not the FormatterSteps. |
Ahh, okay, I see now. Thanks for clearing up my confusion on this.
Hmm, it seems the default properties are not serializable, so the closest I can get to a version of properties which can be used as a key, is with the following code snippet. addStep(FormatterStep.createLazy(NAME,
() -> {
// defaults to all project properties
if (properties == null) {
properties = getProject().getProperties();
}
// properties isn't serializable, so make a serializable copy
return new LinkedHashMap<>(Maps.transformValues(properties, value -> (value == null) ? null : value.toString()));
},
key -> new FreshMark(key, getProject().getLogger()::warn)::compile)); The consequence of this solution is :spotlessFreshmarkCheck never reports as being UP-TO-DATE... |
Looks like we'll need a Possible cause 1: LinkedHashMap<> cares about order, and whatever gradle is using doesn't, so they're getting reordered randomly. |
Gradle definitely doesn't reorder things randomly ;) My bet is on: There is either a timestamp property or a property that doesn't implement The solution is pretty simple: The user should be explicit about what properties should be passed to FreshMark. The default should be none. |
It's very handy that everything in |
There is no such individual property. Every build out there probably has dozens of properties that either are some kind of timestamp or simply don't have a toString() method. The properties are used by build script authors as a bucket to put all kinds of data. |
We could check for unimplemented toString() using System.identityHashCode() and filter them out. Regardless, you're convincing me that adding all props is the wrong way to go :) I'm still curious which is the troublemaker. |
When I print out the properties during a run of
I dare not paste all the properties, because the list is massive. But looking through the first few props, it seems obvious to me that a number of properties simply do not implement toString() and print out their identities instead, as @oehme suspected. I didn't notice any timestamp-related property as I was glancing through the list, so I suspect that the objects which don't implement toString() are to blame. Therefore I'd be inclined to completely throw out the safeguard currently in place to set the properties to @nedtwigg Do you have any ideas as to what we could do instead? Just throw an exception? Or log it and/or perform some other action? |
I think that's the wrong default. It'll always be out of date. It should be no properties by default and the user should conciously decide what she needs. |
I think I agree with @oehme. I think we should add a |
@nedtwigg I admit I don't really understand what you mean by adding a public class FreshMarkExtension extends FormatExtension {
public static final String NAME = "freshmark";
public Map<String, ?> properties;
public FreshMarkExtension(SpotlessExtension root) {
super(NAME, root);
customLazy(NAME, () -> {
// defaults to all project properties
if (properties == null) {
properties = getProject().getProperties();
}
FreshMark freshMark = new FreshMark(properties, getProject().getLogger()::warn);
return freshMark::compile;
});
}
public void properties(Map<String, ?> properties) {
this.properties = properties;
}
...
} ...into: public class FreshMarkExtension extends FormatExtension {
public static final String NAME = "freshmark";
public FreshMarkExtension(SpotlessExtension root) {
super(NAME, root);
}
public void properties(Map<String, ?> properties) {
customLazy(NAME, () -> {
FreshMark freshMark = new FreshMark(properties, getProject().getLogger()::warn);
return freshMark::compile;
});
}
...
} My solution seems to pass all the tests on my machine. What do you think? |
|
Ooh, yes please! If you could write it down, then I think there's a much higher chance that I'd understand. :) |
Hi @nedtwigg, I've been working on incrementalising Here is an abbreviated example of the stack trace I get.
|
I've been having this too. The fix is to delete the |
@nedtwigg Thanks for the advice, deleting I've now made the import sorter related classes ready for incremental builds: 5d365b3. Hmm, having thought about it, I wonder if the reason the build fails at (By my understanding, the serialized bits are saved in files within |
If that is the case, it should be fixed since d3994cf on 12/7, because freshMark no longer adds all the project properties. |
Hi @nedtwigg, thanks for pointing that out. I've investigated things further, and I now think the exception occurs because it tries to read a file from the I don't know why it throws an exception there, and probably more importantly, it's not clear to me why it's even trying to read from the build directory in the first place, as I implemented a filter at some point which prevents (or, at least should prevent) Spotless from reading a Gradle project's @oehme Do you have any thoughts regarding this? |
As far as I can recall your code only excluded the current projects build dir. That doesn't stop it from recursing into subproject folders and reading their build directory. |
Oh oops, spoke too soon. It only works if |
HUGE THANKS to @jbduncan for getting this built. Might be some loose ends still in here, but this issue is too big to be helpful any longer. |
I'm reasonably certain are indeed some loose ends, I just don't know where they are. My reason for thinking this is when I last ran Spotless 3.0-SNAPSHOT against itself a few times 2 days ago, not all the Spotless related Gradle tasks were reporting as being "UP-TO-DATE". I'd be happy to raise a new issue for tackling these loose ends when I'm not feeling so tired, if you so wish @nedtwigg. :) |
Solved part of the problem in c821e85. If you find any 3.0 blockers, by all means raise issues for them :) |
The spotless check task currently doesn't have any inputs and outputs declared and is thus never considered up-to-date. This is slowing down user's builds unnecessarily.
The text was updated successfully, but these errors were encountered: