Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster tokens resolving in syntaxmap #2916

Merged
merged 7 commits into from
Nov 6, 2019

Conversation

PaulTaykalo
Copy link
Collaborator

@PaulTaykalo PaulTaykalo commented Oct 23, 2019

When the request is asked which tokens are have in an intersection, the previous solution was searching for first Index (linearly) and then filtered everything that was coming after that index using intersect function

The updated solution will search for the first token index usingbinary search

Speedup

While this is only one function was updated, the next options were considered:

  • [lin+filter] old solution
  • [lin+prefix] old solution with prefix:wihle instead of filtering
  • [bin+filter] binary search with filter
  • [bin+prefix] binary search with prefix:wihle instead of filtering

The speedup highly depends on the file sizes. The bigger/longer files the bigger win is

Benchmark

Kickstarter

lin+filter lin+prefix bin+filter bin+prefix speedup
0.494 0.243 0.390 *0.117* ~4x

Swift

lin+filter lin+prefix bin+filter bin+prefix speedup
1.739 0.740 1.273 *0.103* ~16x

WordPress

lin+filter lin+prefix bin+filter bin+prefix speedup
1.270 0.526 0.918 0.148 ~8x

Testing code

This code was tested with these parts of code (in Release build)

fileprivate var counter = 0
fileprivate var times: [String: Double] = [:]
fileprivate let timesQueue = DispatchQueue.init(label: "benchmarks")

fileprivate func timeLog<T>(_ name: String, block: () -> T) -> T {
    let start = DispatchTime.now()
    let result = block()
    let end = DispatchTime.now()
    timesQueue.async {
        let oldValue = times[name, default:0.0]
        let diff = TimeInterval(end.uptimeNanoseconds - start.uptimeNanoseconds) / 1_000_000_000
        let newValue = oldValue + diff
        times[name] = newValue
        counter += 1
        if counter % 1000 * times.count == 0 {
            print("!!!!: \(times)")
        }
    }
    return result
}

    internal func tokens(inByteRange byteRange: NSRange) -> [SyntaxToken] {
        let new = timeLog("new") { tokensFast(inByteRange: byteRange) }
        let new2 = timeLog("old") { tokensOld(inByteRange: byteRange) }
        return arc4random() % 2 == 1 ? new : new2
    }

@PaulTaykalo
Copy link
Collaborator Author

Wow. It seems that lazy prefix suffix combination behaves strangely in Swift 5.0/Xcode 10
Will drill it down tomorrow.

@PaulTaykalo PaulTaykalo force-pushed the feature/faster-tokens-resolving branch from 3162937 to d36994b Compare October 23, 2019 22:11
@PaulTaykalo
Copy link
Collaborator Author

PaulTaykalo commented Oct 23, 2019

@jpsim any ideas why this could fail on Swift 5.0? :(
Will check this later

@PaulTaykalo PaulTaykalo force-pushed the feature/faster-tokens-resolving branch from d36994b to d40b7d9 Compare October 23, 2019 22:34
@SwiftLintBot
Copy link

SwiftLintBot commented Oct 23, 2019

1 Warning
⚠️ This PR may need tests.
12 Messages
📖 Linting Aerial with this PR took 2.24s vs 2.28s on master (1% faster)
📖 Linting Alamofire with this PR took 3.98s vs 3.95s on master (0% slower)
📖 Linting Firefox with this PR took 12.74s vs 12.73s on master (0% slower)
📖 Linting Kickstarter with this PR took 29.47s vs 29.61s on master (0% faster)
📖 Linting Moya with this PR took 2.69s vs 2.59s on master (3% slower)
📖 Linting Nimble with this PR took 2.63s vs 2.68s on master (1% faster)
📖 Linting Quick with this PR took 1.09s vs 1.09s on master (0% slower)
📖 Linting Realm with this PR took 3.91s vs 3.89s on master (0% slower)
📖 Linting SourceKitten with this PR took 1.76s vs 1.74s on master (1% slower)
📖 Linting Sourcery with this PR took 4.65s vs 4.6s on master (1% slower)
📖 Linting Swift with this PR took 20.75s vs 21.04s on master (1% faster)
📖 Linting WordPress with this PR took 32.6s vs 32.91s on master (0% faster)

Generated by 🚫 Danger

@PaulTaykalo PaulTaykalo force-pushed the feature/faster-tokens-resolving branch from 364cbdf to 743abea Compare October 24, 2019 20:26
@PaulTaykalo
Copy link
Collaborator Author

PaulTaykalo commented Oct 24, 2019

@marcelofabri updated. Works without even checking for Swift version.

Wordpress

["new": 0.5954486219999982, "old": 2.635693253000027]

Swift

["new": 0.475975328999993, "old": 3.679700348999993]

@jpsim
Copy link
Collaborator

jpsim commented Oct 24, 2019

Thanks again for your continued investments in SwiftLint performance @PaulTaykalo.

However, I want to caution against over-optimizing parts of SwiftLint's runtime that already have a negligible impact on total lint time.

In this case we're significantly increasing the complexity of this code, but because finding token indices is such a small part of the total work spent during linting, there ends up being no measurable improvement on linting overall.

Here's what I measured before & after this change on a large project (~450k lines of Swift):

$ hyperfine -w 1 'swiftlint-faster-tokens-resolving lint --no-cache --quiet' 'swiftlint-master lint --no-cache --quiet'
Benchmark #1: swiftlint-faster-tokens-resolving lint --no-cache --quiet
  Time (mean ± σ):     99.046 s ±  3.707 s    [User: 361.074 s, System: 17.021 s]
  Range (min … max):   93.193 s … 103.159 s    10 runs
 
Benchmark #2: swiftlint-master lint --no-cache --quiet
  Time (mean ± σ):     93.638 s ±  3.545 s    [User: 342.293 s, System: 15.224 s]
  Range (min … max):   90.103 s … 101.004 s    10 runs
 
Summary
  'swiftlint-master lint --no-cache --quiet' ran
    1.06 ± 0.06 times faster than 'swiftlint-faster-tokens-resolving lint --no-cache --quiet'

Maybe if we extracted the binary search logic into an extension on Collection somewhere else, it could simplify the resulting code while also making it more performant? I'd be more open to accepting the change then, even if the impact on performance would be the same.


I don't want this to discourage your interest in making SwiftLint faster though! I'd recommend profiling complete lint runs to get a better sense of performance hotspots and optimize those for the best value.

@PaulTaykalo
Copy link
Collaborator Author

It's not that I'm not doing that :)
I'll check where it can be used in the project.

Technically, this part above looks the same as the previous solution, but instead firstIndex is using firstIndexUsingBinarySearch. Almost the same as func indexAssumingSorted.

Will it work if I put the binary search part in the extension with comments and make it more abstract?


Most of the time is spent on the regexes
The second-most part of the time is spent on the Dictionaries access
I'm fully aware of the bottlenecks. While this speedup is not visible on the whole linting, it's there.
The deviation of the file accesses and sourcekit requests is huge.
P.S. I would suggest to measure this function running time as in PR comment in both versions and check if running times of both versions. (like it did :) I'm just curious what are the timings in your case.


There's IdenticalOperandsRule on the way. without too many regexes. Not sure if will faster though

@jpsim
Copy link
Collaborator

jpsim commented Oct 24, 2019

Thanks for the reply.

Will it work if I put the binary search part in the extension with comments and make it more abstract?

Yes, specifically I think a reusable binary search function on Collection would be pretty helpful.

@PaulTaykalo PaulTaykalo force-pushed the feature/faster-tokens-resolving branch from 4896721 to 73d26bf Compare October 26, 2019 00:31
@PaulTaykalo
Copy link
Collaborator Author

@jpsim So, how about this one?

@PaulTaykalo
Copy link
Collaborator Author

This is how I measured that this solution is a bit faster than previous one


On Swift Repository

master branch 3.1% of total time
image

this branch 0.0%
image
(scroll scroll)
image


On Kickstarter Repository

master branch 0.7% of total time
image

this branch 0.0
image
(scroll scroll)
image


On Wordpress Repository

master branch 1.4 % of total time
image

this branch 0.0%
image
(scroll scroll)
image


In general - the bigger the files in the project - the more speedup can be achieved.
At least this solution is not slower than the previous one.

You, probably don't see any improvements mostly because of the average sizes of files in the project. This is totally explainable. There technically no improvement in the project I'm working on as well.

@jpsim jpsim force-pushed the feature/faster-tokens-resolving branch from 82b224f to f343f84 Compare November 6, 2019 21:39
@jpsim
Copy link
Collaborator

jpsim commented Nov 6, 2019

Thanks @PaulTaykalo, I've made a few small edits and will merge when CI passes.

@jpsim
Copy link
Collaborator

jpsim commented Nov 6, 2019

There's an issue with OSScheck it didn't run properly, I think it's fall out from running on MacStadium. I'm fixing it now,.

@jpsim
Copy link
Collaborator

jpsim commented Nov 6, 2019

Validating OSSCheck fixes here: #2945

@jpsim jpsim force-pushed the feature/faster-tokens-resolving branch from f343f84 to ea777be Compare November 6, 2019 22:53
@jpsim jpsim merged commit 6175c00 into realm:master Nov 6, 2019
@PaulTaykalo PaulTaykalo deleted the feature/faster-tokens-resolving branch November 6, 2019 23:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants