Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate the Gradle Worker API as a potential way to speedup large formatting jobs #148

Closed
jbduncan opened this issue Sep 29, 2017 · 9 comments

Comments

@jbduncan
Copy link
Member

For those who don't know about the Worker API, information can be found here.

This would build upon the speed gains previously obtained from implementing incremental checking (#31).

@nedtwigg
Copy link
Member

nedtwigg commented Sep 29, 2017

I just used the Worker API for the first time on image-grinder.

It adds a little complexity, but not a ton. Would be very easy to parallelize Spotless using the SerializableRef trick.

My small concern is the Gradle minimum version. Right now spotless is Gradle 2+, but adopting Worker requires us to be 4.1+. Not a huge deal.

My big concern is error messages. One use-case for Spotless is to throw errors when code doesn't meet some requirement (e.g. imports a non-API class). Right now, which error you will get is deterministic. In a parallel environment, it won't be - you'll get one error one time, and another error another time.

Spotless' running time on my biggest projects is much less than a second. If a user spends 10 minutes being confused by a non-deterministic error ordering, we will almost definitely have wiped out any time savings that the user had previously gained.

I'm not against experimenting with this on a branch somewhere, but it's unlikely that I'll put time into doing or supporting this, for the reasons above.

@nedtwigg
Copy link
Member

Sorry, didn't mean to close.

@nedtwigg nedtwigg reopened this Sep 29, 2017
@jbduncan
Copy link
Member Author

@nedtwigg

I'm not against experimenting with this on a branch somewhere, but it's unlikely that I'll put time into doing or supporting this, for the reasons above.

Okay, that sounds perfectly okay to me. I raised this issue simply because the idea just came to me, so I wanted to record it down for if/whenever we get the time to do so. :)

@nedtwigg nedtwigg closed this as completed Dec 2, 2017
@jbduncan
Copy link
Member Author

jbduncan commented Dec 7, 2017

Hi @nedtwigg, did you mean to close this issue again? :)

@nedtwigg
Copy link
Member

nedtwigg commented Dec 7, 2017

I did. If I understood your answer correctly, there's no work planned in this direction. If you are planning work on this, feel free to reopen :) Imo a bugtracker is most effective when it's tracking active work - bugs that are affecting users now, and enhancements that are likely to happen.

This feature adds some speed to some runs, at the cost of error-message determinism and introducing thread-safety requirements. There might be some fixes I haven't thought of, but overall this seems unlikely to ship.

@jbduncan
Copy link
Member Author

jbduncan commented Dec 7, 2017

I did. If I understood your answer correctly, there's no work planned in this direction...

Yes, that's right, I don't actually plan to tackle this in the foreseeable future, so having this issue closed for now to keep more active work easily tracked sounds sensible to me. :)

@jbduncan
Copy link
Member Author

jbduncan commented Dec 7, 2017

My big concern is error messages. One use-case for Spotless is to throw errors when code doesn't meet some requirement (e.g. imports a non-API class). Right now, which error you will get is deterministic. In a parallel environment, it won't be - you'll get one error one time, and another error another time.

I'm just jotting down an idea that just came to me for potentially making Spotless both parallel-running and deterministic, if I ever find the time to re-open and address this. :)

My train of thought is to store all exceptions thrown by the formatting functions in a collection, and then sort the exceptions using the following scheme:

  1. Do a stable sort of the exceptions in formatter-function order. So, for example, if GoogleJavaFormatStep is defined to run before ReplaceStep, then those exceptions thrown by GoogleJavaFormatStep will be first and then those exceptions thrown by ReplaceStep will be second.
  2. For consecutive sub-sequences of exceptions returned by the same formatter function, do a stable sort of those sub-sequences by the names of their associated files, in alphabetical order.
  3. For consecutive sub-sequences of exceptions associated with the same file, do a stable sort of those sub-sequences by full exception class name, in alphabetical order.
  4. And finally, for consecutive sub-sequences of exceptions with the same full class name, do a stable sort of those sub-sequences by exception message, in alphabetical order.

We'd then throw these exceptions as one big exception (maybe by adding the collection of caught exceptions as causes of the big one) to report to the user.

Having said all this, I don't know what impact making the exception order deterministic would have on the performance of a parallel-based impl for Spotless. But I'm sure that I could do timed Gradle tests (similarly to how I did it for #31) to figure that out.

@nedtwigg
Copy link
Member

nedtwigg commented Dec 7, 2017

So if I have a project with 1,000 files, and I change from tabs to spaces, then run gradlew spotlessCheck, do I get an OutOfMemoryException as this map is constructed?

I think this 1,000-file change-everything usecase is the only usecase where users might possibly notice a benefit of this parallelism, but it's also where they're likely to experience its biggest downside

@jbduncan
Copy link
Member Author

jbduncan commented Dec 7, 2017

Oh, very good point! I hadn't quite thought about the impact on memory.

If I ever get around to this issue again, I'll give it some further thought. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants