-
-
Notifications
You must be signed in to change notification settings - Fork 31.6k
Make b64decode with validate=True faster by compiling regex #11634
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make b64decode with validate=True faster by compiling regex #11634
Conversation
This is done by compiling the regex that validates that the encoded data only has valid base64 characters.
Hello, and thanks for your contribution! I'm a bot set up to make sure that the project can legally accept your contribution by verifying you have signed the PSF contributor agreement (CLA). Unfortunately we couldn't find an account corresponding to your GitHub username on bugs.python.org (b.p.o) to verify you have signed the CLA (this might be simply due to a missing "GitHub Name" entry in your b.p.o account settings). This is necessary for legal reasons before we can look at your contribution. Please follow the steps outlined in the CPython devguide to rectify this issue. You can check yourself to see if the CLA has been received. Thanks again for your contribution, we look forward to reviewing it! |
IIRC the similar PR was rejected recently because regex compilation increased import time. |
Hey @asvetlov thanks for pointing out. |
I agree with @asvetlov since I opened the issue and that causes the import time to increase. Please benchmark the import time and this would also need discussion in a separate bpo if you are proceeding further on this. Thanks |
…e VALID_BASE64_REGEX
Check the new commit 30aa7d4, it should be able to avoid slow import time and the first call to b64decode would be slow but the other ones would be faster |
Please provide time numbers |
Hey there, here it goes:
|
And here are the import times:
|
Thank you for the PR! This is no longer relevant after #27272 |
Hi there, I was running some code that had to de code a lot of base64 encoded data and
I was using
validate=True
to also make sure it had valida data but it seemed to be wayslower than
validate=False
and I saw that it uses a regex without compiling it andthought of sending a PR.
I didn't put any issue number because I didn't find any and the docs
say that for trivial changes there is no need to do it. Let me know if it would be needed.
And I also didn't create a test because
Lib/test/test_base64.py
already coversvalidate=True
.With the following benchmark the change goes from
15.174073122
to8.906353553999999
.