Skip to content

Improve performance for proofread_canonicals() #258

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Apr 11, 2025
Merged

Conversation

AA-Turner
Copy link
Member

Currently, proofread_canonicals() takes c. 4-5 minutes, with the vast majority of time spent reading the files from disk. This PR improves performance to c. 100-120 seconds by using multiple threads to check the files. We also switch to byte methods over re for another slight improvement, avoiding Unicode encoding/decoding.

A

@AA-Turner AA-Turner requested a review from hugovk April 11, 2025 04:27
@AA-Turner AA-Turner merged commit e80b729 into main Apr 11, 2025
6 checks passed
@AA-Turner AA-Turner deleted the proofread-perf branch April 11, 2025 13:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants