Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve lint performance by optimizing Swift Regex usage #968

Merged
merged 1 commit into from
Mar 20, 2025

Conversation

TTOzzi
Copy link
Member

@TTOzzi TTOzzi commented Mar 20, 2025

From my investigation, Swift's Regex appears to be more resource-intensive to create compared to NSRegularExpression 🤔

When running the linter, a new RuleMask is initialized for each file. Even if the Regex is cached within RuleMask, it gets recreated when moving to the next file because RuleMask itself is newly initialized.
By declaring the Regex as static, it is created only once during the entire linting process, improving performance to a level similar to before.

.build/release/swift-format lint --measure-instructions --recursive /tmp/swift-syntax

- Regex:          49422947117
- Regex + static: 47796699574

Interestingly, using NSRegularExpression with static did not show a significant difference compared to creating a new instance each time.

.build/release/swift-format lint --measure-instructions --recursive /tmp/swift-syntax
- NSRegularExpression:          47743194714
- NSRegularExpression + static: 47465967198

@TTOzzi
Copy link
Member Author

TTOzzi commented Mar 20, 2025

I was able to confirm the difference in creation cost through testing.

let pattern = #"^\s*\/\/\s*swift-format-ignore(?:\s*:\s*(?<ruleNames>.+))?$"#

func testSwiftRegex() { // 0.677 sec
    measure {
        for _ in 0..<10000 {
            _ = try! Regex(pattern)
        }
    }
}

func testNSRegularExpression() { // 0.036 sec
    measure {
        for _ in 0..<10000 {
            _ = try! NSRegularExpression(pattern: pattern, options: [])
        }
    }
}

Copy link
Member

@allevato allevato left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Wait for @ahoppen to sign off too, but those + static numbers look close enough to me that we could keep this and not have to go back to NSRegularExpression.

One other thing you might want to try to squeeze out more performance: by default, Regex will match based on Characters, which involves some semi-expensive Unicode grapheme clustering. I don't expect that we'd ever want rule names that depend on Unicode normalization, so I'd be curious to know if having IgnoreDirective do this would make it even a little bit faster when matching:

try! Regex(pattern).matchingSemantics(.unicodeScalar)

I don't think NSRegularExpression does grapheme-based matching (I could definitely be wrong though), so that might explain part of the difference.

@TTOzzi
Copy link
Member Author

TTOzzi commented Mar 20, 2025

Nice! Wait for @ahoppen to sign off too, but those + static numbers look close enough to me that we could keep this and not have to go back to NSRegularExpression.

One other thing you might want to try to squeeze out more performance: by default, Regex will match based on Characters, which involves some semi-expensive Unicode grapheme clustering. I don't expect that we'd ever want rule names that depend on Unicode normalization, so I'd be curious to know if having IgnoreDirective do this would make it even a little bit faster when matching:

try! Regex(pattern).matchingSemantics(.unicodeScalar)

I don't think NSRegularExpression does grapheme-based matching (I could definitely be wrong though), so that might explain part of the difference.

Oh, I learned something new thanks to you. I appreciate your feedback 😃
I made the changes as you suggested and ran the test. While the improvement is minimal and can vary between runs, there is still a slight enhancement.

measurement result: 47595334398 (approximately a 0.4% improvement)

I'll go ahead and incorporate this change into the code.

@TTOzzi TTOzzi force-pushed the regex-performance branch from 242b29f to 242dfd1 Compare March 20, 2025 15:21
Copy link
Member

@ahoppen ahoppen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch, @TTOzzi. Thank you!

@ahoppen ahoppen merged commit f91aaa9 into swiftlang:main Mar 20, 2025
18 checks passed
@TTOzzi TTOzzi deleted the regex-performance branch March 21, 2025 13:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants