-
Notifications
You must be signed in to change notification settings - Fork 460
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: replace check
and apply
with a single task with a CLI flag
#55
Comments
check
and apply
with a single task with a flagcheck
and apply
with a single task with a CLI flag
This may be a problem with IDEs like IntelliJ though, because AFAIK they don't allow one to easily pass command-line arguments upon double-clicking (or otherwise interacting with) a Gradle task on the IDE's UI. Instead, one has to manually configure command-line options for particular Gradle tasks via a not-easy-to-find-and-use options window (for IntelliJ at least, I don't know about Eclipse or Netbeans), or one would have to use the terminal or command line directly, which for me at least would be a painful to go through, as I'm so used to running Gradle tasks straight from the IDE now. :P @nedtwigg Could you perhaps explain further why you think this idea might mesh better with the in-progress incremental build model than the current |
The check task should produce an output (a report or similar) and can be up-to-date checked. The apply task modifies its inputs and can thus not be up-to-date checked. Thus the two will have different properties/annotations/implementation. Keeping them separate is the way to go. In general, passing arguments should be used to slightly modify the behavior of a task, not switch between two fundamentally different things. |
Maybe an example will clarify:
So for every check/apply loop, we don't get much advantage from the up-to-date checking. Why are we doing this in the first place? Compare that to:
Now, for every check/apply loop, we get the full advantage of up-to-date checking. I think this whole up-to-date thing is crazy :-P, but if we're going to do it, we might as well get some benefit :) |
You run spotlessCheck all the time, as part of your usual Apart from that, what you outlined above is not what would happen. If you change an input property (
vs.
|
@nedtwigg Ah okay, I think I see. Please correct me if I've misunderstood something, but in other words, you want If my understanding is correct, then the first question that comes to my mind is: Would it not be possible to tell P.S. I personally disagree that the whole up-to-date thing is crazy (I think Gradle and Bazel had the right idea here to support incremental builds), although I do agree it's proving to be quite a lot of work! And I also agree that if we're going to do it, we might as well squeeze out as much performance as we reasonably can. :) |
@oehme We can pass |
@nedtwigg not declaring an input that clearly is one is not something I'd support. @jbduncan I don't think that's necessary. You'd only pay the price for a full run once. After that |
Okey doke. We'll keep the current behavior for now. This correctness issue is our real problem. Once we've got everything else solved, I'll come back and look at this with a fresh mind :) |
@oehme Oh okay, I see. Knowing that we'd only pay the price for a full Although, having said this, I had a look at the code for ApplyFormatTask again, and I wonder if @oehme @nedtwigg What are your thoughts on me going ahead and changing ApplyFormatTask so it operates on an (I realise the correctness issue is what we really need to sort out right now, so I'd be happy to wait until we're ready to come back to this issue before discussing my question above.) |
I'm gonna ixnay IncrementalTaskInputs on ApplyFormatTask and close this issue for now. We'll re-examine when we've got something that works :) |
Since the restructure in #56, it's easy to see how little code is really involved in Spotless' gradle hooks. Seems most of it is getter/setter dances :-P. I definitely think it's worth at least tinkering with. I'm very curious to see perf #'s for a giant repo like one of these: |
When I run Spotless 2.4.0 and 3.0.0-SNAPSHPOT against JUnit 5 SNAPSHOT (
I'm really curious at to why it's so slow on a clean run now. Is it because steps like |
If we did @oehme This call to gradle's dep resolution is very slow, greater than 1s every time, even for locally cached dependencies. Are we doing it wrong? Also, I don't think Lastly, because the BumpThisNumberIfACustomRuleChangesTest has been failing for a while, I don't know how old the |
Do you have a profiler snapshot that shows this provisioner taking long? It should be very fast when everything is cached. As for the general performance: I suggest using a profiler, there must be some very obvious bottlenecks in your code. Gradle's up to date checking itself is incredibly fast, you're unlikely to even notice it on something as small as the Junit 5 repo. |
I have a profiler snapshot which shows it taking ~1s per call to resolve, even when all the jars are cached locally. However, in that case it's the |
ProjectBuilder will not behave like a real Gradle build. It is only meant for unit testing, i.e. checking that certain tasks are created etc. It is not meant for integration tests. |
Is there an API for resolving dependencies using Gradle's cache without a Project? |
No, there is not. |
I've had a go at profiling Spotless against junit5-SNAPSHOT with My aim for doing this profiling is to find out which methods in Spotless are taking the longest time to execute, so that we have a starting point for tackling this performance issue. However, @nedtwigg I wonder if you could explain to me how you managed to obtain your profiler snapshot for @oehme Do you have any advice as to what I could do to understand which parts of Spotless are causing the builds of JUnit 5 on my computer to be slow? |
I did my profiling using DurianDebug, especially |
@jbduncan just use a profiler. JFR, YourKit, JProfiler, whatever you prefer. Anything else would just be speculating. |
FWIW, I think your benchmarks were probably against a very old Spotless 3.0. It seems Spotless' maven setup is currently broken. Here's a junit test harness which only uses mavenLocal: https://github.com/nedtwigg/junit5 For me, it isn't able to load Spotless at all. I'm currently working on it. |
Fixed the publishing issues. Uploaded some jankey profiling code in 36d1ab5. Here's the result:
Shocking result: we're creating a new classloader for every call to format() in eclipse formatter. Whoops. After fixing this in 28205df, we improve to this:
18x speedup :) Or, more accurately, my bad for writing a bug a while ago which incurred an 18x slowdown. To complete the profiling chain, I reverted the profiling code, then made a merge commit (that's a nice pattern in general, imo: invasive profiling, fix the problem, revert the profiling, merge). Re: Gradle dependency resolution and memoization, it looks like the first resolution is taking a full 1.4 seconds, but future resolutions are only taking a few ms. I'm surprised the first call takes a full 1.4, but it looks like there's no point in memoizing the subsequent calls since they're already fast (on a real project, TestProvider should stay memoized because of its "fake" project). After that result, I ran the following:
So 2.4.0 and 3.0.0 are both holding steady at 6s config time (not all Spotless). But 2.4.0 allowed the JIT to optimize the eclipse code over time, but in 3.0.0 we break that by creating a classloader for each FormatterStep. It seems plausible to me that that alone is the main performance difference. IMO, that's okay - spotless has always been a little fickle, because other projects that used eclipse jars would muck with it. It's a bummer that getting an isolated classloader is so expensive, but seems worth the win in repeatability. One avenue to explore would be a global cache for classloaders opened by JarState that could persist between runs. Thoughts @oehme? A gradle mechanism for running code with a given set of maven coordinates, isolated from the buildscript, but gets to benefit from JIT over subsequent runs? |
You can cache the classloader in a static variable if it is expensive. Add in some eviction strategy to make it not leak too much and you should be fine :) Guava has good cache implementations. |
Regarding dependency resolution: Is it from an Eclipse.org repository? They are incredibly slow to answer even basic metadata requests. If you can find the same jar on jcenter you should try that first. |
@nedtwigg Wow! The difference in speed now is like night and day! Many thanks for finding and getting rid of that significant bottleneck in Formatter. :D If we're going to look into caching the classloaders in EclipseFormatterStep as they're created, may I suggest we use Caffeine as our caching mechanism instead of Guava's? Caffeine is maintained by @ben-manes who worked on the original Guava cache, and it seems to be generally much faster and efficient than Guava's cache (if these benchmarks, this report on its efficiency and its successful adoption in Infinispan are to be believed. :)) |
Dependency resolution in Gradle is slow because it is sequential, whereas in Maven it is performed in parallel. So an uncached result can be annoying, e.g. due to snapshots. It might be faster to use the configuration sourceSet's Regarding Caffeine, I'd suspect @jbduncan It seems the following blog post explains why they made the change. |
@oehme Agreed eclipse.org is very slow, but request is from jcenter(), and artifact is already locally cached, and is not a -SNAPSHOT. I love Caffeine, but I think it's overkill for this usecase. Just a Map<> with timed eviction - there are so few individual things to cache, no point worrying about LRU. Also, only reason to worry about caching at all is to see if it allows the JIT to warmup for a substantial difference. So first a Map<> to see if it gets faster enough to matter, then a map that evicts after 60s. |
If it's locally cached and not changing, then it should never take seconds. Can you please upload a profiler snapshot showing that this is where the time is actually spent? |
After a lot of trial and error, I managed to produce this YourKit snapshot of a Gradle daemon running on my computer, showing method timings when running (There seems to be no real difference even when nedtwigg/junit5@856f737 and nedtwigg/junit5@4b76534 are reverted and running with Spotless 3.0.0-SNAPSHOT.) |
Your snapshot says that we're spending 70% of execution time inside Eclipse's code formatter. If the other 30% is IO that we can't avoid, then that's good. If the other 30% is creating classloaders over and over, that's bad. IMO, YourKit and the like are extremely useful for finding surprising resource hogs. In this case, it's expected that must of our time would be spent here, so the profiling result isn't particularly helpful. Based on the hand-profiling we did earlier, we found out that JIT warmup is likely to be a big part of our performance, and I don't know of any profiling tools that help with that. They show you where the code, as currently JIT-ed, is spending its time, but they don't tell you how much faster it could be if you gave it time to warm up. Normally you'd warm the code up before launching the profiler, but in this case, our JarState design has (temporarily) traded JIT-warmup for repeatability. I'm gonna crunch on this briefly and try to squeeze a 3.0.0.ALPHA into jcenter / mavenCentral. |
jitwatch is the only tool that I know of that provides some help regarding JIT optimizations. Most profilers interfere with JIT'ing, which is already non-deterministic. If you can leverage the daemon's warm-up that would be ideal. |
Thanks for the jitwatch tip, hadn't heard of it. We can ship 3.0.0 without caching classloaders, so I'm gonna move future discussion of that issue to the link above. |
I still think you might be able to rely on Gradle caching classloaders for you via sourceSet |
Gradle does not provide ClassLoader caching as a service to users. |
Getting back the topic of combining tasks, in c468f37 I smushed all of our various tasks into just // when the task graph is ready, we'll configure the spotlessTask appropriately
project.getGradle().getTaskGraph().whenReady(new Closure(null) {
public Object doCall(TaskExecutionGraph graph) {
if (graph.hasTask(checkTask)) {
spotlessTask.setCheck();
}
if (graph.hasTask(applyTask)) {
spotlessTask.setApply();
}
return Closure.DONE;
}
}); So users still call It would be nice if we had an API that let us manually mark files as "in-date" to squeeze even more efficiency out, but in the meantime this works pretty good. // test our harness (build makes things lower case)
writeState("ABC");
assertState("ABC");
writeState("aBc");
assertState("aBc");
// check will run against all three the first time (and second and third)
checkRanAgainst("abc");
checkRanAgainst("abc");
checkRanAgainst("abc");
// apply will run against all three the first time
applyRanAgainst("abc");
// and the second time
applyRanAgainst("abc");
// but nobody the last time
applyRanAgainst("");
// if we change just one file
writeState("Abc");
// then check runs against just the changed file
checkRanAgainst("a");
// even after failing, still just the one
checkRanAgainst("a");
// and so does apply
applyRanAgainst("a");
applyRanAgainst("a");
// until the issue has been fixed
applyRanAgainst(""); |
@oehme there are more generic concepts like task inputs and outputs or the configuration phase. One could use afterEvaluate to create the ClassLoader from the input at configuration time instead of during the execution phase. |
That wouldn't buy you anything. You'd still have to cache it and you'd be resolving dependencies at configuration time, which is a really bad anti-pattern. |
Deferring the resolution to Gradle means the classpath is resolved as a task input. The ClassLoader would have the URLs but not do any loading until used at execution. The point is one can use Gradle's generic facilities instead of reinventing specialized versions. I don't want to argue about this and I'm sure you'll have a reasonable solution regardless. |
I think you are mixing up |
No, I am not claiming that there wouldn't be a new non-input field. Just that it would be based on the task input and mirror its lifecycle by simple state checking. |
Yes and that field would have to be static (as tasks are recreated on each build), which is exactly what I proposed above. |
Instead of having
spotlessCheck
andspotlessApply
, we could instead just havespotless
, which could do the same thing asspotlessCheck
. In place ofspotlessApply
, the user would instead rungradlew spotless --apply
.This might mesh better with the incremental build model being implemented in #31.
The text was updated successfully, but these errors were encountered: